AI chatbots are having their “tulip mania” moment

snek_case · on Feb 24, 2023

I think Google just had an "oh shit" moment as it dawned onto everyone that we now (almost) have the technology to build personal assistants that actually work, and this could be the new tech that disrupts Google's business.

But now that the dust is settling, people are realizing that this technology is not flawless. We haven't fully 100% mastered it, there are unsolved problems, and maybe it's not quite ready for a broad deployment just yet.

That's normal. It's just the way the hype cycle works. People (and the media) quickly get really excited and build unrealistic expectations, then they become aware of the flaws and limitations. That doesn't mean these kinds of systems can't be better and even more useful in 1-2 years. There's a lot of really smart people working on them with many possible paths to address the current limitations.

wintermutestwin · on Feb 24, 2023

>we now (almost) have the technology to build personal assistants that actually work

Siri doesn't need AI to be useful - it really just needs this $2t company to invest in some basic usability around a set of prompts that people could actually use.

Yesterday's ridiculous Siri failure: "hey siri, set alert volume to 25%" Siri: "media volume set to 25%" That is not bad AI, that is a total failure of UX and it makes it clear that Apple has simply failed to invest resources into making Siri useful.

klodolph · on Feb 24, 2023

I don’t think Apple has “simply failed to invest resources into making Siri useful”, but rather, voice assistants pose major UX problems that nobody knows how to solve at scale.

We have tons of patterns for CLI and GUI usability. But in both cases, we have a easy enough way for users to discover what is possible—in the CLI, you can present a list of commands or categories for commands. In the GUI, we have menus, dialogs, and windows that let you explore. Voice is a much more difficult problem in the first place. Voice commands are often modeless, discovery is difficult, and commands are given with troublesome natural language patterns which must somehow be adapted to multiple languages.

Take a look at all the different voice assistants—Siri, Alexa, and Google. They all kinda suck, despite enormous amounts of money being poured in. From recent news, we know that Alexa managed a loss close to $10 billion in a single year. By all signs, it contains some brilliant engineering, but brilliant engineering doesn’t matter if it doesn’t solve the underlying (difficult) problems well enough.

(I don’t think tools like ChatGPT are going to push the needle much here, at least for a couple years, because integration with all the different software systems takes such a massive amount of coordinated work.)

ghaff · on Feb 24, 2023

In my experience, voice assistants go through a cycle something like:

1. The novelty of the thing!

2. Start using for some simple stuff like weather, timer, alarm, etc.

3. Try more complicated things like skills, really managing controls/media in the car, reliably playing some album.. and largely give up because it's frustrating.

4. Mostly go back to 2.

wintermutestwin · on Feb 24, 2023

>voice assistants pose major UX problems that nobody knows how to solve at scale.

My example of a Siri fail is not at all indicative of a "major UX problem. The fact that Siri doesn't understand iOS nomenclature and that iOS doesn't have the proper hooks to do something that a user would obviously want to do, is indicative of a total failure to invest resources in making Siri useful.

mustacheemperor · on Feb 24, 2023

The one that really boggled me recently:

>hey siri, play Chet Baker Sings on the Apple TV

>playing the best of chet baker sings on Apple TV

>Siri, play the album “chet baker sings” on Apple TV

>playing the best of chet baker sings on Apple TV

Those are not the same album…and only one of them is saved in my library, the one I asked for. It failed repeatedly, too. The only response Siri would produce was to insert two new words to my request and then serve that.

dmd · on Feb 24, 2023

My experience:

> hey siri, play the art of fugue by bach on downstairs

> now playing Love The Way You Lie featuring Rihanna, by Eminem, on downstairs

And like have apple’s ML people just never even heard of Bayes or what? if i have, in the last few days, asked siri to play debussy, ravel, liszt, and chopin, and today I say “play debussy on downstairs” - and i have never, not once, played any kind of rap or hiphop - why would it... “now playing Ain’t No fun - by Snoop Dogg - on downstairs”?

alfor · on Feb 24, 2023

It seems to me those project we intentionally stopped. They could have improved 10X in the last few years but didn’t move a bit.

It seems to be more about willingness than capability.

Now that ChatGPT is there there is no choice other than to expose what they have of become irrelevant.

ghaff · on Feb 24, 2023

I really wish I could tell Apple Music: "Unless I explicitly ask for a rap or hiphop song (which, trust me, I almost certainly won't), never ever play one as part of a playlist."

lamontcg · on Feb 24, 2023

The way that ChatGPT works won't help there. If you tell it to write a story about someone who cannot see, then "see" will be pushed higher into its weights and it'll start talking about the stuff the character can see (if you say the character is "blind" then it'll work better). It is a language model that lacks critical reasoning skills.

panarky · on Feb 24, 2023

Have you actually tried this, or are you just assuming that ChatGPT is just a hyped-up autocomplete, simply doing statisitcal next-token prediction with no internal representation of concepts and relationships?

Prompt: "Write a three-paragraph story about someone who cannot see."

Response:

Maria was born without the ability to see. Growing up, she always felt like she was missing out on something that everyone else took for granted. She would hear her friends describe colors and the beauty of nature, and while she could imagine what they were talking about, she knew she could never truly experience it for herself. Despite this, Maria never let her blindness hold her back. She learned to read Braille and used her heightened sense of hearing and touch to navigate the world around her. She even took up piano and became an accomplished musician, playing by ear.

One day, Maria met a man named Alex who was fascinated by her ability to navigate the world without sight. He would often accompany her on walks and describe the beauty of the world around them, pointing out flowers and trees and the changing colors of the sky. Maria found herself falling in love with Alex, and although she knew that their lives would be different because of her blindness, she was willing to take the risk. Alex was patient and understanding, and he took the time to learn how to communicate with Maria in a way that she could understand. He even started taking piano lessons so that he could play duets with her.

Years went by, and Maria and Alex built a life together filled with love and happiness. While Maria never regained her sight, she realized that she didn't need it to experience the beauty of the world. Through her other senses and the love of those around her, she had everything she needed to live a full and meaningful life.

Darrengineer · on Feb 24, 2023

Interesting! I'm not familiar with chatbots, but this kind of does seem like a sort of autocomplete in the sense that this is a fulfillment of the request in the vein of the lowest common denominator storytelling. While this story is not about what Maria can 'see', it is very much about the sighted world or (imho, problematically) about the lack of sight as a detriment. It's a feel-good story that sounds nice to most sighted people who think that a story about someone who can't see and a story simply about the trait of not seeing are the same.

To illustrate what I mean, do you think it's likely/possible ChatGPT ever comes up with a story about a person who goes to the grocery store, examining produce for ripeness, chatting with another parent about parenting, then going home and feeding their child? None of those things require sight, so it would fit the prompt as long as it portray the person as being able to see.

lamontcg · on Feb 25, 2023

Write more than three paragraphs and see if it makes it consistent or if it starts to lose the plot (literally).

Throw more details in there and put it in situations where it has to actually make the blind person navigate the world and see if it starts to violate the constraint or not.

siva7 · on Feb 25, 2023

The parent already proved the point perfectly. Adding more arbitrary requirements after that is stretching it a bit too far.

lamontcg · on Feb 26, 2023

No that's entirely the point, when you push it then it starts to fail. Three paragraphs and no other story elements is way too short.

ghaff · on Feb 24, 2023

In this case, you don't need ML. Apple has a dictionary of music styles and you just need to be able to exclude certain styles.

siva7 · on Feb 25, 2023

And i thought i was the only one. Siri is actually their only major product not worth carrying the brand "Apple" if we wanna be brutally honest

nerdponx · on Feb 24, 2023

For all we know, this part of the algorithm was written up by a software engineer whose PM didn't think they needed to talk to the data scientists because it was just a voice recognition problem and why would you bring in those eggheads who like to talk about uncertainty, they just make things harder.

LazyMans · on Feb 24, 2023

Years later and my Alexa in my bathroom still only successfully starts NPR when I ask it to maybe 80% of the time.

Alexa isn't getting any better at recognizing my voice? At 8am everyday, it hasn't learned I'm likely asking for NPR, not something else? It's not consistent at all.

Controlling Hue lights with it is atrocious too.

Alexa is purely a dumb voice assistant and there is either a complete lack of investment in it, or that money is invested elsewhere like making Alexa more likely to order too much product off Amazon so someone can point to a chart that shows a bug increased sales.

cameldrv · on Feb 24, 2023

Lucky you! Half the time I say in my car "Hey Siri, navigate to XYZ", it says in its long-winded way, "XYZ is 2.5 miles from your house and gets 3.5 stars on yelp. Is that the one you want?" Then I say "yes." Siri responds "Navigating to XYZ."

Then Siri then just forgets about it and doesn't do anything.

300bps · on Feb 24, 2023

Agree 100% and Amazon has done a similarly bad job with Alexa.

I have five echo devices through my house and have only found about ten uses for them.

They’re useful but 1) finding uses for them and 2) twisting your words around to make it understand you is an exercise in frustration.

It seems like it would be an easy data mining exercise for these companies:

If someone asks for something and stops, don’t flag it. If someone asks for something 3+ times in a row, flag it to figure out why they didn’t get what they wanted the first time. If someone says, “Alexa you’re an idiot” don’t just play a snarky comeback. Flag that for serious review. Then iterate.

There seems to be almost no useful iteration on these virtual assistants.

Hyption · on Feb 24, 2023

Have you tried chatgpt?

It's tremendously coherent.

The chance that chatgpt just hears your example and than does the right thing is nearly a no brainer.

It already understood much more complicated prompts from my tests with ease.

Siri and other agents struggle tremendously and the implication after such a long time has to be a reflection of either a tremendous Missmanagement or that those classical approaches are just too hard.

Or that the chatgpt people are genius.

But that would be even worse: it would show how much impact chatgpt and co will have sooner than later.

lolinder · on Feb 24, 2023

> The chance that chatgpt just hears your example and than does the right thing is nearly a no brainer.

ChatGPT can't do anything. Someone has to, somehow, wire it up to actual physical or digital controls. Intent recognition and slot filling still have to happen somewhere, and that means defining discrete intents and slots.

GPT may be better at mapping from speech to intents, but it can't magically interface with APIs that haven't been defined.

EDIT: To elaborate a bit, the problem in OP's story isn't caused by a bad language model, it's caused by no one at Apple thinking to define different "alert volume" and "media volume" intents. Current language models are plenty good enough to recognize the distinction, so simply adding ChatGPT won't be enough to make any feature work unless someone at Apple predicts the need and writes the interface.

hgsgm · on Feb 24, 2023

If I tell my computer to "froblate" it says "command not found" not "command shutting down".

If I tell it to "shtudown", it said "did you mean 'shutdown'?

There is an AI failure in distinguish "mispronounced command I know" from "command I don't know".

lolinder · on Feb 24, 2023

I'm not convinced that ChatGPT would do better than existing models in that regard. Using it for intent recognition is already a bit of a square peg round hole situation.

Basic intent recognition models are trained to produce a single neuron per intent as the output, which makes it pretty easy to use the activation levels of the output to decide whether to perform an action, confirm an action, or ask for clarification. You just need to check if the certainty is below a certain threshold.

With ChatGPT you'd have to encode the intents as text of some kind (JSON?) and hope that it doesn't just hallucinate an intent that your APIs don't have when it's faced with ambiguous input. You could probably have hallucinated intents map to a decent-sounding error message, but that feels more brittle to me than the existing approaches.

Hyption · on Feb 24, 2023

It can interface with apis it only read about.

Ask it to write an SQL query and it will.

Give it upfront context and it will be able to produce API commands.

The voice recognition can trigger chatgpt API 'create API command for the following text's.

The practical problem is cost and hardware.

But this is closer than we ever were.

lolinder · on Feb 24, 2023

Having ChatGPT write code that you immediately turn around and execute is a very risky proposition for all kinds of reasons, the least of which are crashing at runtime or failing to compile. If you're having it produce code in a Turing-complete language, you've opened yourself up to the worst kind of prompt injection attacks.

A much safer approach would be to have it produce a JSON object that is interpreted by the wrapping code. This allows you to inspect the interpreting code and provide guarantees about what can and cannot be done by users.

But neither execution model solves my main point: someone still has to write the interpreter (or the library functions if you go the dangerous route). Someone has to think ahead and guess what the user is going to try to do and provide APIs that allow that. And that work is most of the work: our existing intent recognition models are actually not bad at all at their job, the failings these voice assistants have are almost all to do with predicting the ways in which users will try to interact with the system.

Hyption · on Feb 25, 2023

Have you tried it out?

lolinder · on Feb 25, 2023

Tried what out? Code generation and execution? Or building a full-featured voice assistant?

The code generation and execution is trivially problematic. I don't need to try it out to know that if someone can coerce one of these bots to spit out its prompt, someone could coerce it to execute dangerous code if it's wired up to an interpreter.

As for the other part of my analysis, in order to empirically test it I would have to build a full-featured voice AI, release it to millions of users, and see if I was able to predict everything they tried to do. Given that my prediction is that even Apple can't do that well, I'm not sure why you think I would bother to try it out.

If an engineer told you that a bridge would collapse if built a certain way, you wouldn't insist on trying to build it anyway just to be sure. Most of the time in engineering you can't run a full-scale test, you have to make do with analyses. If you want to critique my analysis, I'm all ears, but "you haven't actually tested it" is not a critique.

Hyption · on Feb 25, 2023

I meant chatgpt

lolinder · on Feb 25, 2023

Yes. I've used it quite a bit. It's impressive, but the hype is excessive.

All of my analysis is based on my usage of ChatGPT and my understanding of the underlying model. The hype is largely driven by people who don't understand how it actually works and think they're interacting with an artificial general intelligence.

It's a very impressive language model with a lot of applications, but with many fewer applications than the hype would suggest.

Hyption · on Feb 26, 2023

I'm hyped not because I think it's an agi.

I'm hyped because I threw normal text (sentences) against it and it always did what I wanted and it didn't matter if it was bad English or bad German.

I didn't even try promt hacking or promt tuning because natural language already worked so good.

It's a tremendous good 'normal human' text interface.

klodolph · on Feb 24, 2023

Coherent is nice, but correct would be better.

Hyption · on Feb 24, 2023

It's correct often enough already to be useful.

And it doesn't need to be more correct to disrupt industries already it just needs to be more correct than humans.

The most crazy thing is that it's already so good and coherent that it makes totally sense to train a ml model like chatgpt instead of training humans! Because it scales.

We never had this.

And.i already mentor a few junior people I don't mind training an ai instead.

pwinnski · on Feb 24, 2023

This is the same dangerous fallacy of thinking that causes "self-driving" cars to kill people.

If it is "correct enough" to lull people into a false sense of security, the inevitable failures will be worse than if it people remained on guard.

The particular way in which these LLMs fabricate information makes them incorrect enough to be dangerous, not "correct enough to be useful."

People keep mistaking text completion for intelligence and understanding.

alfor · on Feb 24, 2023

Humans are not perfect either at driving, programming or writing.

At some point it is a question of rate of error. The ML model will keep improving as the net get bigger and the dataset larger.

But I agree they are good at having a very good surface understanding while having little of dept at the moment.

Ex: a human making a mistake of visual interpretation while driving will examine in his mind the error he made and thing of the consequences in a more dangerous situation. The NN will not bother (at the moment)

pwinnski · on Feb 24, 2023

Sigh. The models do not "have a very good surface understanding," they are capable of competing strings of words in a way that we interpret as a surface understanding. I believe time will show this is a category error, not just an error of degree.

In much the same way that Teslas are still literally killing people when they encounter vehicles parked across roadways in 2023, just like they did back in 2016, the difference between the appearance of understanding and actual understanding is a much wider chasm than it first appears.

The LLM will not "consider" anything. Even transcripts we've all seen in which LLMs seem to acknowledge errors (rather than doubling down and inventing false sources) are still just text-prediction, modeling what it would be like if someone acknowledged an error.

When a particular LLM seems to take the corrections offered in a widely-publicized transcript on board and not repeat them in future transcripts, that's because someone acted to modify the model, adding filters or weights to ensure that the text prediction path goes differently in the future.

Hyption · on Feb 24, 2023

There is a YouTube video from someone talking to chatgpt to play SQL server.

It can reproduce the right output of 3 insert statements with a select *.

It can interpretate an SQL procedure too.

Whatever the architecture is behind this, is slightly more than just text prediction.

I don't think 'text prediction ' does this type of state management justice.

But hey let's see. Perhaps this is just an emerging feature from a tremendously huge text/language model.

alfor · on Feb 26, 2023

Text prediction is intelligence.

To be able to code a program form a short description, combine different elements to make a solution, understanding is required.

The understanding is in the weights and pattern detection capabilities.

Sure, some part are missing, like the capacity we have to hold a few thought in our mind, correct ourselves, remember the mistakes we made, etc.

The same is true of NN that can differentiate cats and dogs, they detect patterns is a similar ways that humans do.

In the openAI api you can see the level of confidence the NN has on each words it output, it’s just not exposed on the chat interface.

I think the larger missings parts are by design, to give the system memories, agency, access to the internet, to programming would be careless at this point, but it’s happening anyway.

I feel it missing our capacity to examine our thought, to replay events in our head to reprogram itself when the results are not optimal.

SamoyedFurFluff · on Feb 24, 2023

I struggle to believe we can train a LLM the same was as a junior eng, mostly in that I’ve tried to keep a LLM as a journal of sorts and it forgets things way more than a junior does.

Hyption · on Feb 24, 2023

We are not limited to LLM.

And just an hour ago I asked chatgpt how to do something specific with docusaurus (Facebook static page generator) and it just told me the answer.

You know how often I explain things in my current position?

How slow some developers are? How often they forget things I told and explained them? How often they still get things wrong?

It's a slow process and doesn't scale very well at all.

If the table turns and we all teach one system instead ooohh boy.

It will make experts faster and better and potentially removes a certain amount of people in every industry faster than we can imagine.

You know the people who are adding some value but not that really but it's still better to have them than not having them?

Cloud probably got rid of plenty of basic sysadmins.

These new systems break through tasks were no one had an idea how we will break through.

nerdponx · on Feb 24, 2023

It's mostly correct most of the time. That's pretty good.

more_corn · on Feb 24, 2023

It’s about as trustworthy as the average person. You’d want to verify any fact it fed you.

Of course you want to do that with people too. Shouldn’t be too hard to bake in a self check module.

mercutio2 · on Feb 24, 2023

This (incredibly irritating) behavior isn’t an engineering failure, it’s a specific policy failure.

The idea is “we will not give you a way to make our conversional UI incomprehensibly quiet”.

It is an especially dumb policy decision on anything that’s not a HomePod, and it’s also dumb to willfully ignore the user’s intent and focus on the one thing the device is willing to do for you regarding volume, rather than explaining “sorry, I’m always going to be loud, my makers are worried you’ll complain I’m inaudible if you try to get me to stop yelling my responses” but it is deliberate.

hgsgm · on Feb 24, 2023

So it's a UX failure, as parent wrote.

williamcotton · on Feb 24, 2023

Adding a large language model fine-tuned to translate English commands into a limited set of structured actions for Siri would drastically improve the Siri experience.

roncesvalles · on Feb 24, 2023

>That doesn't mean these kinds of systems can't be better and even more useful in 1-2 years.

It's also faulty to assume that improvements will happen linearly. We had a self-driving car mania in 2013-2017 with billions poured into it and people just forgot about it.

Also consider that if easy linear improvements were possible, they would've delayed launching it. AI (software in general?) seems to be the kind of thing where once R&D hits the exponential difficulty wall, they make it generally available.

sva_ · on Feb 24, 2023

> The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time. - Tom Cargill, Bell Labs

CuriouslyC · on Feb 24, 2023

Not so much. Making it available allows OpenAI to evaluate and iterate on the chatbot layer, which has had much less data and engineering invested.

You equate lack of ubiquitous self driving cars with lack of progress in self-driving technology, but even if we've had super-linear progress in terms of self-driving "driving skill," if that skill is below the threshold where the sales from pushing the software are less than the potential legal liability, companies aren't gonna push it.

kenjackson · on Feb 24, 2023

Self driving cars have a failure downside that is very high for basically all use cases.

On improvements— one of the things that has impressed is how much Chat GPT has improved since it was released. I don’t think it’s going to be a perfect AI in two years. But it already is working better for some use cases for me.

Hyption · on Feb 24, 2023

It feels slow to us but from past performance we still progress on this topic and have achieved already impressive results.

And more and more puzzle peaces are falling in place.

We can see were it is going.

And when you follow Nvidia with their digital twin topic, we are converging.

joeltt · on Feb 24, 2023

Just like driving cars were "just around the corner", I highly doubt 1-2 years will fix the issues, that "last x%" is exponentially more difficult.

snek_case · on Feb 24, 2023

I don't think it's necessarily as hard as you think. Can't do math? Teach the model to use an external mathematical expression evaluator to do math. Makes up inaccurate information? Teach it to query an external knowledge base. How hard to implement do you think that is? They're already working on it.

pwinnski · on Feb 24, 2023

> How hard to implement do you think that is?

Very, very, very, very hard.

williamcotton · on Feb 24, 2023

It's not that hard:

https://github.com/williamcotton/empirical-philosophy/blob/m...

mijail · on Feb 24, 2023

@williamcotton - saw this the other day in the prompt engineering post - I enjoyed it. Sent you a message on your site but something looks broken. If you use linkedin I requested to connect.

williamcotton · on Feb 24, 2023

Doh, I’ll look into that contact form! I’ll check my LinkedIn now. I’m glad you found it interesting!

BobbyJo · on Feb 24, 2023

I think there is a greater tolerance for error here though.

Cars haven't made it yet because the assumption about allowable errors was wrong. Turns out it isn't "good enough" for self-driving cars to be safer than humans.

throwaway5752 · on Feb 24, 2023

Alphabet probably has more IP on ML than any group on the planet. There's the Deepmind and AlphaGo, AlphaZero, and AlphaFold but much more practically I can type a sentence into Chrome and get a page for of useful and general correct lines that provide what I want. It would be trivial for Google to summarize the results in some sort of written form instead of a list of links to other places. They could have done that a decade ago.

l33tman · on Feb 24, 2023

The current state of the first 10 results on a google search is quite sorry. It's 5 irrelevant ads and 5 SEO-autogenerated sites (probably using GPT3 in the first place). Try it with any kind of IT-problem or IT-product review question. No amount of AI is going to summarize that load of crap into anything better. This is googles problem. The underlying abstraction of knowledge is going to change - knowledge has to be built and served while severely filtering away "noise" up to 90%. Maybe their traditional algos can do this, maybe it requires GPT+++, but it is broken as it is now and has been for more than a year.

bfeynman · on Feb 24, 2023

I think you're confusing wanting someone to do something for you for free and for billions of people with something that can give you good enough 80% of the time and also make money while doing it.

djyaz1200 · on Feb 24, 2023

"But now that the dust is settling, people are realizing that this technology is not flawless."

Of course it's not flawless, Google isn't and people aren't... and those are the two targets for this tech. People lie, make mistakes and mislead and Google directs to plenty of problematic content.

The thing to watch isn't the position of GPT or any disruptive technology today, but its trajectory. The trajectory of GPT is explosive growth, and that growth will be disruptive to imperfect human jobs and imperfect search engines. GPT does not have to be perfect to be disruptive.

This is especially threatening to Google because they have an organization that has never had to compete, like pandas they just consume. They don't hunt. Look at their vast graveyard of platforms they couldn't sustain even with their monopolistic power and endless money.

Consumers and ad purchasers want competition in search because like Uber and Lyft demonstrate, when you have two technology products competing head to head the margins get compressed toward zero.

Edit: spelling

lamontcg · on Feb 24, 2023

> I think Google just had an "oh shit" moment as it dawned onto everyone that we now (almost) have the technology to build personal assistants that actually work, and this could be the new tech that disrupts Google's business.

The bigger problem for Google is that we've got AI which can produce millions of times more shitty blog articles to flood the internet with, that their SEO algorithms can't keep up with in order to surface anything actually relevant to humans.

Also the problem for all social media and reddit as it becomes easier to write bots that you can't casually distinguish from humans (given how many humans on reddit read like bots that are desperate for attention).

slowmovintarget · on Feb 24, 2023

You're making the mistake of assuming that the primary purpose of these systems is to help you accomplish something you want. These systems are to capture a commodity to resell; you, and all the other users that engage with the system.

It could be great, but until an assistant of this sort runs on your hardware, and works only for you (really hard to show), it won't be your assistant. It will be some corporation's Carnival Crier.

dmitriid · on Feb 24, 2023

> we now (almost) have the technology to build personal assistants that actually work

That parenthetical "almost" is doing an awful lot of heavy lifting.

xg15 · on Feb 24, 2023

> Experts regard ChatGPT-3 and Google’s LaMDA as equally unimpressive. Investors don't seem to understand this

What is "ChatGPT-3" and which experts exactly regard LLMs as unimpressive?

MY feeling so far is more that laymen are less impressed of them than experts - because for people not from the field, the fact that they produce so much bullshit seems to trump all other aspects. Whereas I think for people who have kept track of AI developments a bit more, the fact just how coherent the bullshit is is still extremely impressive.

We've had random text generators for decades, starting with markov chains way back. Up until a few years ago, you were amazed if those things produced gramatically correct sentences - not even dreaming of producing any kind of coherent meaning.

In the potentially more difficult field of understanding text, there are a million small subfields, but nothing that could parse arbitrary english text and extract actionable, semantic meaning, in the face of idioms, figures of speech, complicated coreferences, etc.

Compare this to today, where you can give an LLM a written instruction in freeform and it will just execute that instruction - the answer may be glaringly incorrect, but it's exactly the kind of answer that fits your question.

In the field of understanding text, this is a progress which had seemed largely impossible before.

bonsaibilly · on Feb 24, 2023

> MY feeling so far is more that laymen are less impressed of them than experts - because for people not from the field, the fact that they produce so much bullshit seems to trump all other aspects.

The NYT literally just published an embarrassingly credulous account of how the Bing AI wanted to seduce the author away from his wife and commit various acts of violence, that fully took all of the interactions at face value.

The non-experts are imagining full-on sentience where the experts correctly recognize mere word association shenanigans. Your feelings, in short, are ass backwards.

anonzzzies · on Feb 24, 2023

> The non-experts are imagining full-on sentience where the experts correctly recognize mere word association shenanigans. Your feelings, in short, are ass backwards.

The current model (which is an early version; computers are new on earth and exist only for a statistical error length of time on humanity scale, on earth scale, let's not talk universe scale) is already vastly superior than many humans I know in almost every way (outside manual dexterity and some other fringe stuff you can easily fix with external systems, like we humans do). We have no good definition for what sentience is either; maybe our brain is word association shenanigans; connect up 2 chatgpts and call one 'inner voice' and the other 'external voice'. It will start claiming sentience in no time flat; the same as you. Why are you right? It's a feeling yeah?

bonsaibilly · on Feb 24, 2023

> We have no good definition for what sentience is either; maybe our brain is word association shenanigans;

This is just stupid.

We've pumped more English through GPT-3 than any existing English-speaking human has absorbed in their entire lifetime, and what we've ended up with is something that very very clearly has so utterly failed to generalize even the most basic level of understanding of basic concepts that if you ask it to count the number of letters in a word it will cheerfully pump out the wrong answer ("there are thirteen letters in the word 'twelve'") because its dataset correlates the two words with one another and it has learned precisely sweet fuck all about what it means to count, something a child's brain picks up with exposure to many orders of magnitude fewer language examples.

To imagine your brain is just an LLM is to mistake your reflection for another person in the room. Utterly daft. Get off the LLM hype train and start looking at these things objectively and critically. They're nowhere near what you're imagining.

fiso64 · on Feb 24, 2023

>if you ask it to count the number of letters in a word it will cheerfully pump out the wrong answer ("there are thirteen letters in the word 'twelve'")

Not to be too blunt, but you seem to be talking out of your ass. I'm tired of the overly dismissive comments here on hackernews by people who didn't bother to do the bare minimum of research. LLMs do not work with individual characters, they use tokens (i.e multiple characters, or sometimes entire words).

BaseballPhysics · on Feb 24, 2023

Wow, you... completely misunderstood the purpose of that example.

Seriously, slow down and read it again, all the way to the end of that sentence.

fiso64 · on Feb 24, 2023

Uh, no.. I don't believe I did. Asking it to count the letters in a word is like asking a human to listen for a 30kHz tone without them knowing that humans can't hear it. I'd expect a lot of false positives.

bonsaibilly · on Feb 24, 2023

Which reinforces the point I was making. Stop trying to win an argument online and think for 5 seconds about what it means for the claim "the human brain is just an LLM" if you're arguing that an LLM is naturally ill-suited to this task human children can do without issue.

fiso64 · on Feb 24, 2023

>the human brain is just an LLM

Never have I said such a thing. Also, BPEs aren't a natural limitation of any transformer-based model, it's a trick to save compute for LLMs.

BaseballPhysics · on Feb 24, 2023

So what is this supposed to mean?

> We have no good definition for what sentience is either; maybe our brain is word association shenanigans;

Sure sounds like you're suggesting the brain is an LLM to me, and I can't blame bonsaibilly for thinking that.

fiso64 · on Feb 24, 2023

Keeping track of different usernames is hard, and I can't blame you for confusing me with the other guy :)

snark aside, I think he does have a point - afaik, we don't know what intelligence is, so it's kinda hard to make any argument about fundamental differences between "true" intelligence and LLM intelligence. I do feel like there should be something else. I saw somebody describe their mind as consisting of a "babbler" and a "critic" (a GAN, basically). The LLM would be the babbler while the critic is not yet implemented, and this sounds intuitively right to me. Then again, my intuition could be completely wrong and we may be able to get to human level intelligence with further scaling. I haven't seen any solid counterarguments yet. And not even the biggest LLM believers are denying the fact that it's not exactly the same thing as a human brain, but the question is whether it captures the gist of it.

anonzzzies · on Feb 24, 2023

Yep. It’s not called tokens for nothing.

williamcotton · on Feb 24, 2023

Here is an objective and critical assessment of large language models and how to empirically improve the results of questions involving calculations like the lengths of strings:

https://github.com/williamcotton/empirical-philosophy/blob/m...

fullshark · on Feb 24, 2023

That author was playing dumb for the good of his story, in particular to help it go viral. This is what news media is in 2023.

BaseballPhysics · on Feb 24, 2023

> In the field of understanding text, this is a progress which had seemed largely impossible before.

...

LLMs don't understand text. They encode statistical probabilities of relationships between strings of characters. Those are not the same thing. That's the entire damn point of this article!

Honestly, for someone taking pot shots at "laymen", this is a profound thing to misunderstand.

xg15 · on Feb 24, 2023

Ok, on the risk of sounding exactly like one of the "sentience" guys, but "statistical probabilities of relationships between strings of characters" is a non-answer that makes the whole thing sound simpler than it is without actually providing any explanation how it works. What kind of statistical probabilities exactly?

Right now, research is still ongoing what kind of knowledge is actually captured inside the models, but there are some hints that higher-level, "semantic" knowledge might emerge during training: https://thegradient.pub/othello/

tysam_and · on Feb 24, 2023

Both are happening, I believe.

The LLM absorbs the manifold (high-dimensional shape) of the data. The manifold contains the underlying concepts of grammatical structure, abstract thought, and inductive reasoning, which the LLM (within reasonable capacity and structural capability) captures because it is a more efficient representation of the underlying data.

Transformers are simple and contain the correct building blocks to efficiently capture these representations during training.

This allows for out-of domain generalization.

This is most, to my knowledge, of what happens in most deep learning processes and is likely 90% of the main content that one needs to know to understand neural networks at their core. I think it's pretty basic but it often gets buried in unconscionable mathematical symbols and fancy-speak to be of use to anyone reading.

xg15 · on Feb 24, 2023

Yeah, that's seems to be my impression as well. Of course it's true that in a sense all it does is capturing statistical relationships - because that's all that is there in the input data. However what kind of relationships it captures may still be arbitrarily complex (within the contraints of the model architecture).

We know it's more complex than just "empirical probability of word x given the n words before", because that would result in a markov chain and we know those don't generate the kind of output we are seeing.

We also see that it's able to map descriptions of tasks to its execution, even for unseen tasks. E.g., I can tell it "Write a limerick that contains the names of all living US presidents and format is as a JSON array inside a python script.". The result will probably contain some dead US presidents or some canadian prime ministers or whatever, and the text may be a haiku and not a limerick - but the output will usually be a python function with a JSON array with a poem with some names in it.

I don't see how that could work without a more abstract representation of the concepts "president's names", "poem", "json" and "python", so it can combine them meaningfully into a single response.

tysam_and · on Feb 25, 2023

Yes, and very good points. I really appreciate your response. Though with respect to the markov chain side of things I think you may be missing that a neural network is still fundamentally distilling a markov chain into its representations -- that is exactly what it is predicting (the transition matrix over n tokens). Presumably though, the generalization happens in that the raw n-context chain (for example, a 2048 token context chain) is completely infeasible to compress into the network directly, so the network takes shortcuts where it is least punished, which I believe invariably trims the combinatorial edge cases where certain things don't show up in the data.

So it is estimating the markov chain, just in a way that is compressible and according to the inductive biases as we define them (i.e. what we nearly force the network towards with our architectural and otherwise decisions).

xg15 · on Feb 25, 2023

True - I think the key is "empirical probability", so just keying on the combinations of all the tokens in your context window.

(Also I think the "last n tokens" term is a bit misleading: ChatGPT seems to have an n of "approximately 4000 tokens or 3000 words" [1, 2] which would amount to ~6 pages of text [3].

I've seen very few conversations even approaching that length - and in the ones that did, there were reports of it breaking down, e.g. having continuity errors in long RP sessions, etc. So I think for practical purposes we can say its "probability of the next token given all the previous tokens".)

Building a naive markov chain with such a large context is infeasible even before compression, you couldn't even gather enough training data: If you have a vocabulary of 200 words, that would give you 200^4000 [4] conditional probabilities to train. You'd have to gather enough data to get a useful empirical probability for each of them. (Even if shortened the length to the ones of realistic prompts, like 50 words or so, 200^50 is still a number too big to have a name)

Which is why, from what I've got, the big innovation in transformer networks was that they don't look at each token in their context window but have a number of "meta models" which select which tokens to look at - the "attention head" mechanism.

And I think there, the intuition of markov chains break down a bit. Those meta models make the selection based on some internal representation of the tokens. But at least I haven't really understood yet what that internal representation contains.

[1] https://www.reddit.com/r/deeplearning/comments/zk5esp/chatgp...

[2] https://help.openai.com/en/articles/6787051-does-chatgpt-rem...

[3] https://capitalizemytitle.com/page-count/1500-words/

[4] this number: https://www.calculator.net/big-number-calculator.html?cx=200...

tysam_and · on Feb 25, 2023

Oh yeah, good points. And citations! Thanks so much, really appreciated.

I'm doing work currently on this and hopefully it will yield some fruit -- at least, the work relating on the internals of what's happening inside of Transformers. Just from slowly absorbing the research over the years I have a few gut hypotheses about what is happening. Hopefully any of the work I do will yield some fruit, I think a good chunk of what is happening is surprisingly standard, just hidden due to the complexity of millions of parameters slinging information hither and yon.

Thanks again for putting all of the thought and effort into your post, I really appreciate it. This is something I love about being here in this particular place! :D

xg15 · on Feb 25, 2023

That sounds extremely interesting! I'm still trying to understand the basic transformer architecture so far, but I think those kinds of insights are exactly what we'll need if we don't want the whole field to degrade to alchemy with no one understanding what is going on.

Do you have a blog?

tysam_and · on Feb 26, 2023

I do not, (unfortunately? Maybe I need one?), though if you want you can follow me on GitHub at https://github.com/tysam-code. I try to update/post to my projects regularly, as I'm able to. It alternates between different ones, though LLMs are the main focus right now, if I can get something to an (appropriately and healthfully) publishable state. :3 :)

I try to be skeptical about certain possibilities within the field, but I do feel bullish about us being able to at least tease out some of the structure of what's happening due to how some properties of transformers work. At least, I think it'll be easier than figuring out how certain brain informational structures work (which has happened to some tiny degree, and will be I think even cooler in the future)! :) XD :DDDD :)

quartzic · on Feb 24, 2023

Tried this out and got this result:

            import json
            presidents = ["Biden", "Obama", "Bush", "Clinton", "Carter"]
            limerick = "There once were presidents five, " \
            "Biden, Obama, Bush, Clinton, and Carter alive. " \
            "They served our country with pride, " \
            "And kept our democracy alive. " \
            "May they continue to thrive!"
            print(json.dumps({"limerick": limerick, "presidents": presidents}))

dylan604 · on Feb 24, 2023

>the fact just how coherent the bullshit is is still extremely impressive

great, so it's a politician?

I really hate that this is being heralded as a triumph. Someone being very confident in their wrongness does not make it right. This is something we've been dealing with for some time, but it seems to be orders of magnitude stronger than say 20 years ago. Now we're celebrating how our AI systems are amazing at this. I just want to hang my head in shame at what we've come to accept.

xg15 · on Feb 24, 2023

Yeah, I agree with you here. I find it impressive on a scientific level and as a demonstration of what is possible. Whether any of that is actually beneficial for society or if we're just opening pandora's box is a wholly different question.

vannevar · on Feb 24, 2023

Tulips were not one of the greatest scientific advances in human history. AI that can write and converse intelligently on virtually any subject is. When the chief complaint about AI is that it lies too much, that's a massive breakthrough.

thundergolfer · on Feb 24, 2023

It's a great engineering advancement, but scientific advancement implies some kind of gain in human knowledge, and that hasn't happened to a great extent.

We still don't understand how these models really work, at least not in a way that teaches us anything about language or about humanity.

pixl97 · on Feb 24, 2023

>implies some kind of gain in human knowledge

I find that a strange take. Almost all scientific advancement ends in one thing. Another set of unanswered questions. Quite often we gain more knowledge, but know less because we realize there are thousands more questions we should have been asking.

For example, theory of mind being observed out of just processing language is a big arrow that we may have been thinking about many things wrong in the past and open new avenues for 'scientific' tests that give us answers.

ianmcgowan · on Feb 24, 2023

It's got to have shaken up the field of cognitive research or people studying mind/consciousness.

Is there a "hacker news" of theory of mind researchers? Probably some interesting conversations happening over there.

rcme · on Feb 24, 2023

> We still don't understand how these models really work, at least not in a way that teaches us anything about language or about humanity.

We understand how LLMs work conceptually. Human language, when parameterized as a sequence of tokens, has geometry and structure in a billion dimensional space. A priori, we didn't know that. So in a very real sense, a discovery was made.

moffkalast · on Feb 24, 2023

> but scientific advancement implies some kind of gain in productivity

FTFY. At least any kind of actually notable scientific advancement like using metals, farming, steam power, electricity, cars, planes, computers, etc.

If a person can use a chatbot to speed up their work then that's a major advancement in productivity right there.

ianmcgowan · on Feb 24, 2023

Almost a textbook answer to "what is the difference between science and technology".

moffkalast · on Feb 24, 2023

Two sides of the same coin.

dd36 · on Feb 24, 2023

We’ve absolutely gained knowledge. We’ve learned how good these models can be.

We’ve effectively made like a general teenager that can be up-trained for specific tasks. It’s super useful if you know how to use it.

est · on Feb 24, 2023

> but scientific advancement implies some kind of gain in human knowledge

The unexplained Chain-of-Thought emerges from LLM surely would an huge gain in "knowledge". In fact I'd say it potentially the most important meta-knowledge human race can gain, you know, to unlock the mystery of "conciousness" itself.

boringg · on Feb 24, 2023

I'd 100% agree with this take. It's a great engineering achievement but it isn't a scientific breakthrough, much less one of the greatest scientific achievements of humankind.

reeboo · on Feb 24, 2023

Projecting our humanity onto these models by assuming they are "lying" is a mistake. These models are not thinking, thus, they cannot lie.

inciampati · on Feb 24, 2023

Indeed. And furthermore, I notice that most criticism of these systems focuses on their lack of factual coherence and accuracy, ignoring their real incredible power of translation. Give concepts in any structure or language, at any level of coherency, and you can get out clear code, prose, or poetry that represents exactly the input.

Ask for a biography of a random public figure without giving notes and there is 0 chance they will provide an accurate result. This may mean we need systems that are literally memories without the noise driven generalization, and that these need to be exposed to the said LLMs as a resource. It's not insurmountable and it certainly isn't tulips.

pwinnski · on Feb 24, 2023

> ignoring their real incredible power of translation.

I feel like everybody commenting on this thread ought to have to write "it's text completion, not intelligence" on a chalkboard 500 times or so first.

throw_away1525 · on Feb 24, 2023

Well, they can certainly converse intelligibly. Intelligently is up for debate.

Retric · on Feb 24, 2023

These chat bots can’t converse intelligently, they can fake well enough it can take a little while to realize. https://xkcd.com/451/

There’s some great chess videos for example where the bots start by making decent moves until you quickly find out they not only don’t know how to play well they don’t even understand the rules or the board. https://youtu.be/rSCNW1OCk_M What’s fascinating is how easy it is to give something the benefit of the doubt because it’s using language, we just aren’t very suspicious of things that can talk.

luckylion · on Feb 24, 2023

Now instruct people who don't know how to play chess to pretend to know and play against someone. And when they fail because they don't know the rules and initially just mirrored their opponent, that's proof they're not a general intelligence?

pwinnski · on Feb 24, 2023

You're doing it, too. You're thinking of these AI models in terms of human intelligence.

The models are not operating at a lower or higher level along a scale of human intelligence, so they're not genius-like or child-like. They're not operating on the same scale at all.

They're text prediction models, and many seem to insist on anthropomorphizing them in a way that causes them to forget that every other second or so.

Retric · on Feb 24, 2023

ChatGPT can say the rules of chess it just doesn’t understand them, thus stupidly rather than ignorance.

Further if you watch the video it wasn’t moving the pieces incorrectly it was just creating them from thin air. This wasn’t like an 8 year old learning the game this was like your cat randomly hitting the keyboard.

luckylion · on Feb 24, 2023

> ChatGPT can say the rules of chess it just doesn’t understand them, thus stupidly rather than ignorance.

It acts similar to a human that has a button for "recite chess rules" and presses it when you ask them to recite the chess rules. You'd then get a reply with the chess rules without the button-presser getting the same info, so they wouldn't gain any knowledge from it. I'm sure you can find a lot of people who could, if asked, read you the chess rules but even after reading wouldn't have understood them themselves. But are they not generally intelligent?

I get a feeling that people compare ML-models to a smart person that concentrates on what you ask them, and LLMs fail in comparison but do so confidently. If you instead compare them to an average person at their daily average focus level, the ML probably looks pretty smart.

I have a neighbor who is not very bright. He'll see some news on TV and will tell you about it, only he doesn't understand it and will just repeat words and half-sentences as best he can remember. He doesn't know he doesn't understand (or he does but doesn't care), so he'll confidently state utter nonsense. He's not smart and I wouldn't go to him for advice, but I think he qualifies as "intelligent" in the binary sense.

Retric · on Feb 24, 2023

> I have a neighbor who is not very bright.

Even a dumb person is vastly smarter than an ant. So when I am saying ChatGPT is dumb I mean compared to having a conversation with a person which seems like the only reasonable scale to use with a chat bot.

ChatGPT’s model requires quite a of of processing power and sophistication to form its responses. It just doesn’t comprehend what it’s saying.

luckylion · on Feb 24, 2023

Do you compare its conversational skills to a smart person or a dumb person? Because the result of the comparison will be very different, I believe. In other words: on a blind Turing test, if you can't tell apart ChatGPT from a dumb human, is the dumb human not generally intelligent? He also doesn't comprehend what he's saying once you get beyond basic things, and he's still vastly smarter than an ant.

I'm sure you've talked to people before who used certain terms but it was obvious to you they didn't actually know what those terms meant. They're able to use them without understanding the concept. Just like ChatGPT, only that it's more likely to sound like ChatGPT understands the concept.

What I'm trying to say: humans have a very clear limit to what concepts they can grasp. They can punch above their weight by pretending to understand a concept, but the proof is in the pudding, so unless they're able to apply it to solve problems, they haven't understood it. And even if they do understand something, they'll make mistakes. If the test for intelligence is essentially 'can perform certain tasks with a maximum error rate of x, and can learn to perform new tasks', does 'understanding' really matter? And can you even really test for understanding, aren't you only testing for 'gives answers to questions that signal understanding'?

Retric · on Feb 24, 2023

Objectively its conversational skills are so bad it’s below the level where humans get institutionalized due to severe disabilities.

I don’t have specific level beyond that because it’s deficiencies don’t map to human condition.

As to a touring test given enough time and freedom in questions you can easily tell it’s a machine so it just fails. That’s the thing, the touring test isn’t can you tell if this is a machine in 7 seconds while talking about baseball it’s can to tell period.

dragonwriter · on Feb 24, 2023

The quality of the article is sonewhat telegraphed by the way it messes up even the name of the (currently most publicly populat) LLM it references in the first sentence, blending the names of two different OpenAI products.

Which, I suppose, even if the specific facts and arguments in it are wrong does support the headline thesis; every commentary outlet feeling so desperate to say something about it, even if it means rushing hot garbage out the door, is certainly indicative of something of a “tulip mania” moment.

turkeygizzard · on Feb 24, 2023

Yes, there's a lot of hype around AI right now, and yeah sure it looks like some crypto grifters rotated into AI - but overall, I really don't get how people can look at all the progress and think there isn't something important here?

I see a spectrum of responses to GPT, but the really important thing is the rate of change. Even if GPT somehow doesn't impress you currently, why do you think things won't be improved in 6 months? What about 5-10 years from now? I can understand thinking that transformers are maxing out on their sigmoid curve of progress, but every time people complain about perceptrons, RNNs, CNNS, etc - another team invents something new and blows out all the benchmarks. These things won't get worse, so even if you assume a pessimistic rate of change, how different will things look soon?

The observation that these AIs are "mere text generators" is a testament to how insane they are IMO. The fact that they're just predicting the next token and are still this powerful is paradigm-shifting for how I viewed human intelligence. I don't think these things are sentient, but if they can replace me at work in 10 years or less, who cares?

BaseballPhysics · on Feb 24, 2023

> Even if GPT somehow doesn't impress you currently, why do you think things won't be improved in 6 months? What about 5-10 years from now?

Need I remind you of the AI winter?

Progress isn't linear. You want to believe progress will continue at its current pace. I'd suggest it's every bit as reasonable to assume progress will stall out as current techniques plateau, just as they did in the past.

Teever · on Feb 24, 2023

Everything ends, it's true.

But there's no reason to believe that this technique will plateau before some noteworthy results are obtained.

Other than being connected by the term 'AI' what makes you believe that the past performance of researchers is indicative of future performance?

Are any of the limitations of the Apollo program inherent limitations of space exploration? How are they connected?

BaseballPhysics · on Feb 24, 2023

> But there's no reason to believe that this technique will plateau before some noteworthy results are obtained.

Absolutely there is, and the article specifically touches on it.

Because these models do not encode semantics, there's very little reason to believe the hallucination problem can be solved with current techniques.

In fact, OpenAI themselves have been downplaying expectations about GPT-4, and my suspicion is that's because these techniques are already starting to plateau.

I fully expect that GPT-4 will create even more natural-seeming language, with greater sentence variation, less repetition, etc, and with no change (or possibly even an increase) in the frequency of factual errors in the results.

Der_Einzige · on Feb 24, 2023

We will literally never go into another AI winter. The results and earnings from AI are too good to put the cat back in the bag.

BaseballPhysics · on Feb 24, 2023

LOL, I suspect a lot of crypto bros were saying the same thing a couple of years ago.

Let alone other hyped technologies that have failed to deliver, like self-driving cars, voice assistants, AR/VR...

Heck, the AI winter was preceded by a boom in things like expert systems, and I'll bet folks back then thought it'd never end, too.

Der_Einzige · on Feb 24, 2023

The fact that you'd lump Crypto and AI together is simply pathetic. Crypto still hasn't actually solved any real problems or delivered meaningful value. AI has solved real problems and delivers astronomical value.

I'm willing to put my money where my mouth is. I'm willing to enter into a $$$ wager and assert that AI, specifically LLMs or similar systems, will only become more important and more in demand in 5 years.

BaseballPhysics · on Feb 24, 2023

I wasn't the one using "dollars pumped into the bubble" as a predictor for likely future success. If you don't like the comparison, come up with a better argument.

Hyption · on Feb 24, 2023

I already saw ml (dall e 2 and chatgpt and gpt3) as impressive.

But I actually played around with chatgpt just today and holy shit.

I never ever talked to a bot like I talked to chatgpt.

Is it perfect? Not at all but I already was able to use it for practical things.

It feels tremendously natural.

This is so close to everything I'm looking for in an AI that it shows all the potential already without much fantasy.

This will not go away.

It will transform every aspect.

I wish I would be in university right now. This thing can explain things.

This thing understands context.

And it's multilingual out of the box.

The coherence chatgpt already shows let me believe that it does create an internal model which is more than just doing basic statistics.

Alexa was great to experience because it showed me already how a voice interface feels like.

But Alexa and chatgpt?

Chatgpt trained on documentation!

Chatgpt / ml like this will be the universal knowledge library the Rosetta stone.

We are experiencing the creation of the Alexandria library.

Let's see how long it will take to combine dall e/Stabel Diffusion, got, chatgpt together with other things like voice synthesis and voice recognition.

zehaeva · on Feb 24, 2023

It can explain things, but it gets a ton of things wrong, very wrong.

Using this is college would be a recipe for disaster, you'd ask chatgpt to explain a thing to you and you, being a student, don't know if it's getting the explanation correct or wrong. You don't know if it's reasoning is correct.

ChatGPT isn't telling you the truth or how the world is, it's just spitting out the statistically more likely next word.

It's far enough on this side of the uncanny valley so you think it's a conscious human unless you know enough about the subject to realize that it's not.

pwinnski · on Feb 24, 2023

It's a text prediction engine, and clearly people are impressed. I suggest saving your statements here, like "This will not go away. It will transform every aspect. This thing understands context." and so on, and revisit them in a few years when the hype balloon has been punctured.

Perhaps "will not go away" will still seem plausible, but I think the rest will age poorly.

Hyption · on Feb 24, 2023

I asked chatgpt just an hour ago to tell me how to put a doc at the first position of the navigation with docusaurus (Facebook static page generator).

It told me the sidebar.js way.

I asked it to tell me another option which I can configure in the doc file directly.

Chatgpt got all of that right. If this is just a text prediction engine, it's crazy impressive how well it just works.

I don't even start designing the prompt to get what I need.

I definitely will follow all of this very closely and already have plenty of use cases for this 'text prediction engine '.

I'm hyped because I'm waiting for ml to break through for a while. Google for example has a huge us hospital under it's cloud platform with tons and tons of data.

We will see more and more expert systems.

Chatgpt will push for more investments.

This is not just a hype it's part of the journey we are for a few years now.

And in comparison to shit like crypto or nft there are already clear benefits.

Stabel Diffusion is fun to use and to explore.

Chatgpt is fun to use and to explore.

Not that this is any indication but the interest in SD was huge but the interest in chatgpt is much bigger. We had multiple internal talks about it and due to the interest in it, more got scheduled.

You will no longer search through documentation. You will just ask the product documentation what you need.

All of those real life use cases pushes more and more money into this topic

akritrime · on Feb 24, 2023

Calling something tulip mania is the lazy way out. Tulip Mania was very limited in scale and didn't really affect much. Comparing each and every new trend, either legitimate or not, to tulips is nothing more than buzzword pattern matching.

bonsaibilly · on Feb 24, 2023

> Comparing each and every new trend, either legitimate or not, to tulips is nothing more than buzzword pattern matching.

Must've been written by one of these "AI"s, then.

MuffinFlavored · on Feb 24, 2023

To me, tulip mania implies "people are going crazy over it when they shouldn't (because tulips didn't really have much value in hindsight)" and I don't know if I'd agree with that. I pay $20/mo for ChatGPT. I find myself using it every day. I have long conversations with it. It's basically "Google/blogspam summarizer that you can dive into with context". I feel it allows you to go deep.

elicash · on Feb 24, 2023

I just used ChatGPT to help me code something in a day that would have taken me weeks and lots of StackOverflow previously. So I mostly agree with you.

But consider that I didn't use Github Copilot only because I wasn't sure if it was free (and ChatGPT functioned well enough). This, even though I kept hitting limits in the system. It's still the case that for a lot of these use cases that people are using ChatGPT for that a chat interface is pretty inefficient for getting things done.

I do think the chat UI is probably overdone right now. But that's just because there's a multi-year usability learning process that still needs to happen. It's why Microsoft is building GPT into each of their products differently and others will do the same. Chat is a short-term design trend for many use cases, just for the moment. That said, the tech is expanding so quickly that we'll likely see more chat rather than less even as other UIs have even more room to grow.

MuffinFlavored · on Feb 24, 2023

This is not my experience whatsoever. Maybe I'm not good enough of a programmer/high level enough (I'd be surprised if this was it) but I can't imagine ChatGPT spitting out incorrect syntax helping me much on something I can't solve.

I also don't have a very "pair programming friendly" way of thinking though/style. Maybe that plays a role.

I ask it to tell me about libraries or how a solution might get solved (recommend me known products out there, products as in like, popular open source software you might find off the shelf on GitHub) but that's about it. I can't see it telling me how to architect something at a high level or giving implementation details.

I've asked it about compiler errors before and how it might fix them, which has lead me to incorrect answers, but did help because I could Google/StackOverflow what was returned as it was typically "on the right track" after a few tries. But that was mainly me just "rushing" and not reading documentation/not having a good understanding of what I was working on (like being more familiar with Rust than C# for example)

throwaway4aday · on Feb 24, 2023

They're both superpowers but they're different.

ChatGPT: 24/7 pair programming partner, universal documentation wizard and instant Stack Overflow answerer. Also good at translating from lang A to lang B.

CoPilot: Autocomplete and suggestions on steroids, on the fly template and boilerplate generator and master refactorer.

elicash · on Feb 24, 2023

I agree.

But right now, it's so easy (and fun, I've done it myself) to spin up chatbots that there aren't enough folks doing the more difficult UI work for the various use cases, in my view.

its-summertime · on Feb 24, 2023

Suggesting that people shouldn't invest because of the current product and not because of the potential future product, is silly.

"automated calculators", are calculators not insanely useful, especially earlier on in their history?

"but how many companies will be willing to jeopardize their reputation by giving their customers incorrect information?" Yet links to companies doing exactly that.

"carbon emissions" Yet datacenters all together put out 1% of total emissions globally, once again linked.

- - -

The article feels almost like a list of dotpoints which then get elaborated on, regardless of if they are notably relevant, la ChatGPT.

fwlr · on Feb 24, 2023

“Someone, I think Bertrand Russell, said we compare the mind to whatever is the most complex machine we know. Clocks, steam engines, telephone relays, digital computers. For AI, it’s the opposite: as capabilities increase, and our understanding of AI systems decreases, the analogies become more and more dismissive. … Machines you can use for free today pass the Turing test, these are called ‘stochastic parrots’, which is dismissive, but at least parrots are living things that engage in goal-directed activity. Every single day machines do something that if you were told of it not one year ago you would have called it witchcraft, but LLMs are ‘blurry JPEGs’. Yesterday I heard ‘Markov chain on steroids’. When the first computer wakes up we’ll call it a pile of sed scripts. …”

https://borretti.me/article/and-yet-it-understands

rvz · on Feb 24, 2023

This AI hype mania is once again a repeat of the GPT-3 hype and it amounted to using an AI SaaS as the so-called next revolution with everyone and their parrots screaming that they are also an 'AI company' which is total nonsense, just like ChatGPT would hallucinate.

Again. This mania will subside just like the Clubhouse mania did.

CuriouslyC · on Feb 24, 2023

This article is silly. It's taking LLMs that were trained on statistical token -> token probabilities without any concept of truth or accuracy, and implying that LLMs are hyped bullshit that is incapable of truth or accuracy.

LLMs can easily be trained on (token -> .. -> token, factual_accuracy) annotated data, we just need to build a "fact check" data set for common token sequences. That data set will be expensive to build, but we could always build a fact check app that gamifies the process to help build it instead of just paying turks.

xg15 · on Feb 24, 2023

I'm not sure if this would work, as you cannot always define the factual accuracy on a token-by-token basis.

However I wonder if having the LLM answer questions based on the knowledge baked in the model is the way forward at all: You quickly run into issues with up-to-date information, as you'd have to do a possibly expensive retraining every few weeks at least - and you also have no way of controlling which information exactly is stored exactly and how the models combines it.

I think an interesting alternative approach could be to split the bot up into two "agents" and one backend database:

- One model which receives user input and translates it into a machine-readable well-defined representation e.g. a series of SQL or SPARQL queries, a series of API calls or whatever.

- Then have a wholly non-AI backend that validates and executes those queries and delivers a result, in machine-readable format as well.

- Finally have a second model (or the first model with different finetuning/prompt) which can translate the query result back into a freetext sentence, which is then delivered to the user.

This way, you can keep tighter control about what exactly the model is answering and you can also update the underlying knowledgebase independently of the model.

ianmcgowan · on Feb 24, 2023

That sounds like the blog post by Steven Wolfram recently [1] where he proposes having two models cooperating - one that "knows" facts (has computational knowledge in that post), and the other that controls the language interpretation and decides which facts are relevant. It's interesting to go further and speculate that our "mind" is made up of cooperating sub-systems like this.

https://writings.stephenwolfram.com/2023/01/wolframalpha-as-...

CuriouslyC · on Feb 24, 2023

It wouldn't be on a token by token basis, but rather "common" token sequence tuples. If you generate a histogram of common 5-8 grams, and filter out all the ones that are just linguistic filler and keep the ones that are "factual," then feed those "factual" n-grams to reviewers (via captcha/turk/game) to get a distribution of accuracy scores that you then condition the model on, so it can produce "accuracy" estimates for token sequences, and further condition on high accuracy scores for generation.

for example (butchered for clarity), (["The", "2022", "election", "was", "stolen"], accuracy=normal(0.0001, 0.1)).

BaseballPhysics · on Feb 24, 2023

If they could do that they already would have.

Frankly, I think you're massively underestimating the difficulty in what you're describing (assuming what you're describing would actually work, and I'm not sure it would). To do what you're suggesting, you'd need a human to evaluate and feed into one of these LLMs every known fact. Good luck with that.

And that's ignoring the issue of maintaining that model as those facts change.

CuriouslyC · on Feb 24, 2023

That's not true at all. Right now there's a tension in ML between hoovering up as much data as possible (since more data, even if it's bad, tends to improve the model in general even if it poisons certain parts of it) and curating/annotating data sets to improve answer quality and consistency.

Data set annotation is expensive and time consuming, and it's a hard sell to shareholders to spend a billion dollars on data set annotation when there's still more data that can be cheaply obtained, and the liability or cost for producing incorrect answers is intangible or low. After Google's massive share hit from Bard's exoplanet flub though, I don't think the costs are quite as intangible as they were, and I expect accuracy will become a new push in models over the next few years.

BaseballPhysics · on Feb 24, 2023

> That's not true at all.

I actually made three statements in my comment and you only disputed the first one.

Yours is a good counter-argument to my first statement, and it is a valid rebuttal, I concede the point!

But I maintain the scale of the problem, and the difficulty of maintenance, will make manual curation impossible, while automated "fact generation" is a technology that doesn't exist today.

CuriouslyC · on Feb 24, 2023

Eh, when we have trillion dollar companies whose core product is AI commonly used for question/answer tasks, and query accuracy directly impacts the bottom line, I don't think spending 10 billion a year on annotation is a big ask. It might not be perfect, but as long as your query responses are significantly more accurate than the competition you'll end up as Google to their Altavista.

Additionally, you don't have to manually curate the entire data set, only a large enough chunk to predict accuracy annotations for the stuff you haven't curated yet. Facts that change over time shouldn't be hard to identify if you periodically do cross validation on existing annotations to pick out ones where the predicted accuracy differs significantly from the annotation.

There are a lot of process and engineering questions still to be answered in terms of building high quality AI products, but I don't think there are hidden dragons that are going to lead to another AI winter. There will be hiccups and snafus, but AI is the real deal.

throwaway4aday · on Feb 24, 2023

It'd be awesome to get a digitized version of essay answer tests with grading. I wonder if those are kept or if they just get tossed?

theandrewbailey · on Feb 24, 2023

ChatGPT and similar technologies are on the slope up to the peak of inflated expectations.[0] (Or are we on the way down already? Damn, the peak happened fast.) I eagerly await the next parts of its hype cycle, and wonder how they will go down.

[0] https://en.wikipedia.org/wiki/Gartner_hype_cycle

ghaff · on Feb 24, 2023

As of mid-2022, Gartner had emerging AI technologies (somewhat surprisingly to me) early in the cycle--so on the way up but near the beginning.

kenjackson · on Feb 24, 2023

Of course AI doesn’t understand words. It has never interacted with the world. But that’s a weird test to use. AI systems that do facial recognition don’t actually know what faces are either. And when they classify a plate as a face we don’t get too worked up about it. We just continue to tune it or make the next big advance.

hcks · on Feb 24, 2023

> LLMs are mere text generators. Trained on unimaginable amounts of text, they string together words in coherent sentences based on statistical probability of words following other words. But they are not "intelligent" in any real way — they are just automated calculators that spit out words.

These kind of takes are getting tiring fast. I’d even go as far as to qualify them as midwit takes.

Yes you can reduce LLMs to their output channels, or trivialise what their doing by using words as « statistical ».

Facts: ChatGPT, a rough demo based on a model that is already obsolete, is nearly indistinguishable from the average college student.

throwaway4aday · on Feb 24, 2023

We haven't even scratched the surface yet. "chatbot" isn't the killer app, AI in the loop is.

https://twitter.com/lemonodor/status/1628270074074398720

https://twitter.com/lemonodor/status/1628420476136730624?cxt...

h2odragon · on Feb 24, 2023

The cryptocurrency boom wasnt a bubble for major media until after it popped.

This headline may be the best indicator that there's actual fire under all the smoke for AI.

roncesvalles · on Feb 24, 2023

Many were saying crypto/blockchain/NFT (will refer to the cluster as "web3") was a bubble all along. Most notably, the Hacker News "hivemind" had a markedly anti-web3 lean for at least the last four years.

It might've seemed like there were more positive voices for web3 because many people stood to gain monetarily from it. It was more "democratized" in that way. With AI, there is a bigger moat and it's unclear whether there's big money to be made by anyone other than those who already own the underlying tech or have tens of billions to catch up.

paulgb · on Feb 24, 2023

> It might've seemed like there were more positive voices for web3 because many people stood to gain monetarily from it.

Yeah, I think people underestimate the prevalence of this effect. At one point I tried figure out the origins of a story (covered by MSNBC) that the web3 game Axie Infinity was lifting people out of poverty in the Philippines. Every story was sourced back to the same report by a related party of the game.

(Axie ended up imploding catastrophically. I haven’t seen any stories about what happened to all the Filipinos dragged along into it)

fullshark · on Feb 24, 2023

It's just one article by one writer about a few anecdotal examples of ChatGPT failures which highlight current limitations that anyone interested in the space should be aware of. It's useless noise and not an indicator of anything.

BaseballPhysics · on Feb 24, 2023

> The cryptocurrency boom wasnt a bubble for major media until after it popped.

You clearly weren't paying attention to the media over the past 5-10 years.

ilaksh · on Feb 24, 2023

They couldn't even get the name right. Called it "ChatGPT-3".

Saying that it doesn't understand words is ridiculous.

victorp13 · on Feb 24, 2023

$ Write an opinion article entitled 'AI chatbots are having their "tulip mania" moment'

xchip · on Feb 24, 2023

The same points could be made from anything new.

Prob it is a copy and paste from when gun powder was discovered.

pfdietz · on Feb 24, 2023

Let's ask ChatGPT about this!

> Write an essay about AI chatbots having their "tulip mania" moment.

> The Tulip Mania of the 17th century was a period in which tulip bulbs in the Netherlands were traded at exorbitant prices, resulting in a speculative bubble that eventually burst. The phenomenon is often used as an example of irrational economic behavior, with investors caught up in a frenzy of buying and selling without considering the actual value of the commodity they were trading.

> In a similar way, it is possible that AI chatbots could experience their own "tulip mania" moment in the future. Chatbots are becoming increasingly popular in many areas of business and communication, from customer service to healthcare. They offer the promise of cost savings, improved efficiency, and 24/7 availability. However, there are concerns that the hype surrounding chatbots could lead to unrealistic expectations and irrational investment.

> One potential trigger for an AI chatbot tulip mania could be a breakthrough in the technology that makes them significantly more capable. For example, if a chatbot was developed that could convincingly mimic human conversation and intelligence, it could generate a wave of excitement and investment. This could result in a rush to develop and implement chatbots in every industry, regardless of whether they are truly effective or necessary.

> Another potential trigger could be a high-profile success story of a chatbot that generated a significant return on investment. If a company or organization was able to demonstrate that a chatbot had dramatically improved their customer service or reduced costs, it could create a rush to invest in similar solutions. This could lead to a flood of new chatbots being developed and implemented without sufficient testing or consideration of their actual effectiveness.

> However, just as with tulip bulbs in the 17th century, the hype surrounding AI chatbots is unlikely to last forever. There are already concerns about the limitations of chatbot technology, including the difficulty of programming them to understand complex or ambiguous language, the potential for biases to be introduced, and the risk of chatbots making errors or providing inaccurate information. These limitations could become more apparent as chatbots become more widespread, leading to a burst of the hype bubble.

> In conclusion, it is possible that AI chatbots could experience their own tulip mania moment in the future, driven by hype and unrealistic expectations. While chatbots offer many potential benefits, it is important for investors, businesses, and developers to approach them with a critical eye and evaluate their actual effectiveness and value. Only by doing so can we ensure that chatbots fulfill their potential as a useful tool for communication and customer service, rather than a passing fad.

kenjackson · on Feb 24, 2023

I don’t know if anything in this story is true. But if this was a story I needed to write, this would be a great potential starting point.

Comparing them to tulips just seems naive.

jimmySixDOF · on Feb 24, 2023

Tim Harford did a recent live talk podcast discussing how our concept of the Tulip Mania is mostly a sensationalized version :

>Recorded before an audience at the Bristol Festival of Economics (11/17/2022)

> The Dutch went so potty over tulip bulbs in the 1600s that many were ruined when the inflated prices they were paying for the plants collapsed - that's the oft-repeated story later promoted by best-selling Scottish writer Charles Mackay. It's actually a gross exaggeration.

>Mackay's writings about economic bubbles bursting entertained and informed his Victorian readers - and continue to influence us today - but how did Mackey fare when faced with a stock market mania right before his eyes? The railway-building boom of the 1840s showed he wasn't so insightful after all.

> For a full list of sources used in this episode visit Tim Harford.com See omnystudio.com/listener for privacy information.

ghaff · on Feb 24, 2023

I write a fair bit and have played with ChatGPT and my conclusion is this:

On the one hand, there is no way that I could just hand ChatGPT output straight to an editor.

At the same time--at least for certain types of stories--ChatGPT output is going to save some time relative to a blank page and it may even prompt you with some points you haven't thought of. You have to fact check and make it less formulaic and otherwise spruce things up. But I bet it could save me a couple hours on some articles.

Pigalowda · on Feb 24, 2023

Did it really generate that essay in 1 prompt? If so, that makes me a little uneasy for some reason.

pfdietz · on Feb 24, 2023

It did! That's the entire interaction right there.

m348e912 · on Feb 24, 2023

Honestly a very good summary.

zoklet-enjoyer · on Feb 24, 2023

Everything is tulips