Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The new Bing will happily give you citations for a pile of nonsense (twitter.com/arbuge)
178 points by arbuge on Feb 16, 2023 | hide | past | favorite | 164 comments


I'm not convinced at this point that conversational AI fills any legitimate use cases. For all the pushback crypto got here, it's fascinating to see the support that this grift has gotten. The grift being the amount of energy we burn training models juxtaposed to the price people will pay for a nothing burger.

Things like Stable Diffusion and DALLE are pretty cool, though a bit novel and toyish at this stage.


Except the energy costs for these models is such a tiny fraction of crypto. Cambridge researchers estimate annual energy consumption of just bitcoin at 119 terawatt hours (TWh) [1] or 60 megatons of CO2 [2], just below all of Portugal.

Meanwhile, even a very critical paper on this from some Scandinavian researchers [3] says GPT-3 cost 190,000 kWh (0.00019 TWh) to train. ChatGPT/GPT-3.5 is allegedly an order of magnitude larger in terms of data and cost to train, so let's say it is 0.0019 TWh.

When The Register reported on that paper, you can see how they tried as hard as they could to make it sound big: the cost to train GPT-3 was the same as driving a car to the moon and back (435k miles). They could have said it cost the same amount of carbon as 25 US drivers emit each year. In the grand scheme of things, that's nothing. That's one long-haul flight per trained model. And you just need to train them once. Querying the models cost far less.

And the electricity generated for US-based server farms is way cleaner than cars, planes, or the coal mines powering Chinese bitcoin mines.

[1] https://ccaf.io/cbeci/index

[2] https://ccaf.io/cbeci/ghg/comparisons

[3] https://arxiv.org/pdf/2007.03051.pdf

[4] https://www.theregister.com/2020/11/04/gpt3_carbon_footprint...


Thank you for this amazingly insightful comment.

A question as an ignorant layperson, if I may:

  They could have said it cost the same amount of carbon as 25 US drivers emit each year. In the grand scheme of things, that's nothing. That's one long-haul flight per trained model. And you just need to train them once. Querying the models cost far less.
Don't they need to continuously re-train these models as new information comes in? For example, how does Bing bot get new information? It seems like they would need to routinely keep it up-to-date with its own index.


> Don't they need to continuously re-train these models as new information comes in? For example, how does Bing bot get new information? It seems like they would need to routinely keep it up-to-date with its own index.

Sure, and thanks! Some keywords to look up are transfer learning, zero-shot learning, and fine tuning. These approaches focus on exactly this problem: not having to retrain the entire model from scratch to add new information. GPT-3's training data is 100 billion tokens of text, but to extend it by another 1 billion tokens of text is far closer to 1/100 the original cost.

It actually wasn't the energy/carbon cost that motivated early work in this, it was more about adapting to new domains and letting people customize models for specific purposes. Image processing really adopted it first to great success. Orgs with resources trained really big models on all of ImageNet that needed server farms of GPUs, but they released it so that other people can use a single commodity GPU to fine-tune it for whatever their specific image processing task.

Edit: now you can pay "Open"AI to fine tune their models for you, but only Microsoft has access to the raw model itself


>> Some keywords to look up are transfer learning, zero-shot learning, and fine tuning. These approaches focus on exactly this problem: not having to retrain the entire model from scratch to add new information. GPT-3's training data is 100 billion tokens of text, but to extend it by another 1 billion tokens of text is far closer to 1/100 the original cost.

Well, if you fine-tune GTP-3 on another billion tokens you get a fine-tuned version of GPT-3, that's perhaps better at modelling those billion tokens. If you want a better GPT-3 you have to pre-train a new model, probably with a few more billion parameters. So transfer learning is not going to save the day here.


Yeah, those models need to be retrained often. Not to update an index, but to keep up to date with the new content on the internet, and of course to create new, improved models with more parameters: note that after GPT, we had GPT-2, GPT-3, GPT-3.5 (a.k.a. ChatGPT) and rumours are we're now at GPT-4 (powering Bing search). Meanwhile, each of those models was trained in multiple versions, with different numbers of parameters; for example, the GPT-3 that first broke the barrier of hype was the largest, at 175 billion parameters, of four or five models. Plus, it seems reasonable that OpenAI, Google et al. are retraining models every once in a while to correct mistakes (e.g. OpenAI say in their GPT-3 paper that they couldn't retrain their 175Gp model to correct a mistake, because of the high cost of training, but that was a couple of years ago and they have clearly trained multiple large models since, so why not GPT-3 again? Except of course they're not very -cough- open, about those things, so we can't know for sure).

Bottom line, the cost of training "GPT-3" varies a lot and is paid multiple times.

________

GPT-3 paper for ref: https://arxiv.org/abs/2005.14165


Going by the screenshots in the linked tweets, it seems like it performs searches on Bing in order to obtain up-to-date information to answer its questions with, so there's probably not a need to re-train it daily. So the main question here might be "how much energy does it cost to keep a search engine up-to-date", which may not be cheap, either.

There is probably a need to refresh it periodically to account for what the MMAcevedo fictional story [1] calls "context drift" -- the relevant search terms to infer from the query are themselves contextual. Say, if I ask Bing today "is Trump running for president", the right search term today could be "donald trump 2024 election", but ten years from now it might be "eric trump 2036 election".

[1]: https://qntm.org/mmacevedo


> For all the pushback crypto got here, it's fascinating to see the support that this grift has gotten.

Two very different things. I could never really see crypto taking off without massive political changes. ChatGPT, on the other hand, is already my daily assistant. The utility is here and now.

It's rarely perfect, but it gets me 90% of the way there, then I tweak the output.

How good it is depends on what you're using it for, as with anything - I don't think it's really a good fit for something like a search engine (yet) as it's terrible with facts.

But scaffolding code, summarizing text, expanding outlines - it's very good at those kinds of tasks, often astoundingly good.


> it's terrible with facts

It's terrible with specific facts, and frequently hallucinates dates, math, book titles and citations to sources.

But it's awesome with ideas and concepts.

Today I spent an hour chatting about the competing ideologies, groups, motivations, objectives, and foreign allies and adversaries in Spain leading up to the 1936 coup and civil war.

I cross-checked many of its assertions, and it nailed some pretty esoteric stuff.


Weirdly, I heard of some genuine use from a family in a pub the other day.

Two of the kids were university/college age, talking to their parents about it.

They were using one of the AI models daily for things like: recipe ideas for food they had in the house, scheduling daily activities, bouncing ideas off for essays, and asking for gift ideas.

They understood it wasn't much good for facts, but liked the conversational interface as a way to give them ideas or jumping off points for things.


Two other uses cases I forgot I'd heard one of the students mention: 1) drafting emails to people, 2) checking if he'd forgotten any key aspects in coursework essays.

For 2), he said that after writing an essay, he'd ask the GPT model something like "I've written an essay on X, and talked about A, B, and C. Are there any other important factors I've forgotten to discuss?". Then use that as the basis for more research.

From the sounds of it, he was treating the model more like a colleague he could bounce ideas off, not something to be treated as infallible (much like another student I guess).


I guess my adversion to it is that there is always the side eye you have to give it. its not a reliable source of information, so if you have to check it against another source anyhow all the time, whats the point of bothering with it to begin with? sometimes its easier to have some boring database search tool, or to talk to someone directly.


Yeah, same for me. I sounds like the uses they'd found for it didn't really involve expecting reliable information, more just as an initial source of ideas with a nice conversational interface.


I've been using ChatGPT/GPT-3 as a French tutor, to answer questions about different ways to say things, formally or informally. It's not always accurate, but still I learn from it.

Amusingly, there's a mildly rude expression that means "I don't care," and ChatGPT's strong aversion to swearing biased it toward completely fabricating a word-for-word translation. It insisted that the verb was a conjugation of something innocuous, and when I pointed out that that verb doesn't conjugate that way, it both agreed with me and continued insisting.

It seems to have a pretty solid grasp of language in general (so long as you don't need any hard logic), and it's a fun tool to come up with ideas you personally never had before.

This was a little experiment the other day; it's delightful how many silly (yet justified) reasons why you can't walk to Hawaii it came up with. But what do you expect from someone who's only learned about the world by reading the Internet?

> By "unsummarizing" we mean to take some text and expand upon it at length, using fancy words, extra facts, anecdotes, and interesting observations. Your goal is to unsummarize the following text as much as possible.

> Text to unsummarize: What do you mean I can't walk to Hawaii? (GPT-3 follows)

> You may be surprised to learn that it is impossible to walk to the Hawaiian Islands from anywhere on the mainland United States. Though the Hawaiian Islands are located in the middle of the Pacific Ocean and are roughly 2,400 miles away from the nearest point on the mainland, it is not possible to walk there due to the vast expanse of ocean between them. Even the most dedicated and determined walker would be unable to traverse the depths of the Pacific. The nearest point of land to the Hawaiian Islands is the Johnston Atoll, which is located approximately 1,400 miles away. That is to say, it is not feasible to traverse the 1,000 miles of open ocean between Johnston Atoll and the Hawaiian Islands by foot.

> The Hawaiian Islands were formed as a result of volcanic activity, and the chain of islands were formed over millions of years. This activity is responsible for creating the immense walls of water that separate the islands from the mainland. The Pacific is one of the deepest, widest oceans on the planet, and its depths are not suitable for walking. The bottom of the ocean is filled with a variety of topographical features, such as trenches, ridges, and seamounts, which make it a difficult environment for walking. Additionally, the ocean is filled with a variety of natural hazards, such as hazardous currents, extreme temperatures, sharks, and other dangerous marine life. All of these factors make walking to the Hawaiian Islands not only difficult, but also impossible.


>Even the most dedicated and determined walker would be unable to traverse the depths of the Pacific

> Pacific is one of the deepest, widest oceans on the planet, and its depths are not suitable for walking

I love that. Almost a dry humor.


I loved:

> [The volcanic] activity is responsible for creating the immense walls of water that separate the islands from the mainland.

In some sense, that's a true statement, right? The volcanic activity didn't create the water itself, but the "wall-like" nature of that water does depend on the islands jutting out above the surface. Otherwise there would be no "immense walls" to refer to.


> scheduling daily activities

Can you elaborate on that?


This was from an overheard conversation, so not 100% sure myself. It sounded like he was giving it a list of activities he wanted to achieve within a certain time window (going for a jog, studying for exams, coursework, etc.), and asking it to write a schedule for him.

He then said he asked it to tweak it in a few ways (e.g. more time on study, a 5 minute break between activities, etc.). Then he'd print it out and pin to his desk.


I think it's more useful if you're asking it to create something new, similar to the image generation tools. I can tell it write a script to call some public API, write a rough draft of an essay, generate dialog for a video game, etc. It's up to you to vet the quality, but it can easily do all that. It's still toyish just like the the image tools.

Asking it to teach you something and cite sources is a massive stretch. People are just trying to trip it up on a dumb use case. It's pretty fun to trip up an AI, but it's not really surprising.


> Asking it to teach you something and cite sources is a massive stretch. People are just trying to trip it up on a dumb use case.

The primary use case of a search engine is to find sources of information on a particular topic. Embedding an AI chat bot in your search engine is very much encouraging users to treat it as something that can find sources of information on a topic.


That's true. I'm mostly thinking of chatGPT because that's the one I've used.


It’s great for summarizing content, or lengthening it. It used to be great for writing insane, outlandish versions of stories such as “a version of the movie Saw as an educational cartoon for preschoolers” or “a breaking bad scene where Walter White reveals himself to be a gothic Satanist” but the woke and copyright edits killed that. Bing will probably do that once I get access tho


I find it useful for answering technical / programming questions. It is completely unhelpful maybe 20% of the time, explicitly incorrect in a minor detail half the time, and imperfect 80% of the time.

However, it gets me an answer to my question in seconds almost always faster than searching. Traditional googling will have me wading through forum posts and bad documentation which almost always takes much longer and sometimes completely fails or takes hours to get the right answer.

Fast occasionally wrong answers are way more helpful. I’m quickly switching to first asking chatgpt for an answer and only switching to google or official docs when the answer doesn’t seem to work.

I absolutely don’t “trust” it, but I don’t have to. I try its suggestions and if they don’t work move on to other sources of information.


It certainly has use cases, but they’re more marketing oriented. It can help people who’re not fluent in English write better copy, it can do low key reporting work by writing articles from press releases, it can summarize or classify text, or do sentiment analysis. There are probably a couple of billion in revenues in there with various APIs. But it just isn’t the next big thing as we’ve led to believe the last few weeks. The kind of technology that could give Microsoft a fighting chance against Google.


> low key reporting work by writing articles from press releases

That is not journalism. It is the opposite of journalism.


I didn't said journalism, I said reporting. A lot of the stuff in news outlets are copies of press releases.


I think "reporting" is a term without an agreed definition outside of journalistic context. But I'd be happy to argue that it isn't reporting either.

I agree that there are plenty of outlets that might qualify as news-shaped, or perhaps news-flavored. They might use AI-rewritten press releases. E.g., CNET's private-equity slide into untrustworthiness. But these things are basically parasitic on actual journalism. So although this is technically a use case for ChatGPT, it's a societally negative one.


A lot of the human language translation business seems it can be significantly augmented with these tools. We took som content, asked it to be translated to another language. Asked it to tighten up the translation a bit. End result was really very good.


I made a video last week about why ChatGPT is better in some ways than current dedicated machine translation systems like Google Translate. If anyone is interested:

https://youtu.be/najKN2bXqCo


I would read a blog post.


It is important to manage the human/AI interaction in the right way. Asking conversational AI for the truth won't work because they are not built in this way. At least for now. We applied this insight in creating TikTok videos. https://medium.finclout.io/everything-in-this-video-is-gener... And while a lot of the basic parts of creativity can be handed over, the implementation of the higher parts of creativity are still firmly in human hands.


Grift?

While there is some valid discussion that needs to be had about the short comings and application of AI (LLMs, neural nets, diffusion models, etc), lumping it to the out right Ponzi scheme of crypto seems either disingenuous or ill informed.

Can you elaborate on how they are similar so I can better understand your point of view?


Thanks for asking! I've responded elsewhere in the thread to this. Feel free to cross reference those as well.

There's a large amount of cost sunk into the idea of conversational AI from energy use (not just during training, but data storage, on-going compute, etc), to the cost associated with businesses employing people and the personal investment that folks have invested via degrees and time. All of this creates a pretty deep trench worth defending by various interests. These things clearly aren't ready to be put into search engines, yet here we are, and there are people defending it. That's the original grift that makes it akin to crypto if you believe the real problem in crypto wasn't the tech itself but people pushing these concepts into mainstream use before they'd really had time to be vetted and fully understood. The societal impact was massive and on-going. There are also grifts that come from the seed grift, which gets into AI plagiarization and how to train these models. When you add it all together and then frame it under the premise that it's now in one of the most trusted and prominent places in search, I feel justified calling the underlying activity a grift.


> I'm not convinced at this point that conversational AI fills any legitimate use cases.

Silly opinion. This is the first wave of conversational AI, and you are calling it quits. This is like "I own a Model T and I'm 100% convinced we'll never make a better vehicle."

The use case, at minimum, is: customer support, answering phones, taking orders. Trained on specific data sets and told not to veer outside. Within one, maybe two more iterations of AI (maximum) we will be there 110%.

ChatGPT was released widely this year, and there are so many absolute statements about what AI will or won't do. It's frankly silly and maddening.


The Model T was immediately useful and affordable to the average American. It's most notable contribution to American society was connecting those in rural areas to the rest of the country and being affordable enough for those people to afford them. That's why it spread across the US seemingly overnight.

ChatGPT on the other hand does a lot of tricks, but trying to fit it into the real world is challenging. Even using it in programming requires someone to double check it's work. The idea that it can handle customer support I think is very dangerous. In an industry that has done the opposite of creating fulfilling customer support experiences we should be wary of filling that void with an LLM that's fraught with factual errors in output.

That's to say, ChatGPT does novel things, but nothing useful beyond fantasy (I do hear people talking about using it for RPG characters - which is fair). Ironically, many of the commenters here responded the way cryptobros did when their tech was regarded as useless, which is telling about where this is going. The problem wasn't the tech, the problem was the inability in everyone around it to sit back and acknowledge that how they described it did not match what people wanted and experiences on the ground.


The difference is that, to this day, getting a straight answer on what web 3.0 is, is difficult. On the other hand, people are using ChatGPT to write all sorts of emails - cover letters, resignation letters, layoff letters, customer support letters - and that is a use case that those who aren't being willfully blind can immediately grasp as being useful, for people who are in job roles that require a lot of that sort of work.


> willfully blind

Nice jab.

I think if you produce a novel technology and make it free then people will use it. If you feel confident that these things are "useful" then have people pay for it. Will they still use it for beyond entertainment then?

CoPilot, after an extensive free preview period, garnered something like 400k subscribers at $10/m. I'm curious to see where that number settles at over time. My hunch is that the things you're seeing people use ChatGPT for are more novel than useful and a lot of the value prop is that it's currently free.

To put this in context, GitHub has $1B ARR and roughly 90M active users.


How much do you think grammarly.ai is worth.

Now consider that ChatGPT is able to subsume it https://www.reddit.com/r/ChatGPT/comments/112z2j4/i_built_a_...

This is only one application among many. PS people _are_ paying for it, the premium is $20/month.

So yes, you are blind on this, sad to say.


Okay, let's take it back a notch. What if I compared it to the combustion engine, or the first engine with an electric starter?

It's easy to tear what I said apart if you fixate on the Model T. But the point of my comment is the potential GROWTH of the technology.

If you looked at a simple engine designed to chop firewood. You might say, there's no way the entire geography of the United States will be altered with highways based on this invention. Look at it, all it can do is cut wood! And who will be there to hold the wood in place? It doesn't know how to align with the grain of the wood, etc etc.

And imagine saying this within the first year of said motor's invention.

That is what people are doing with ChatGPT.


This may or may not hold true (this DS is nowhere near as convinced as you are) but your parallel with Model T is odd, seeing as how that was immediately useful and absolutely revolutionary.


I think the open question is whether its future is more like the Model T or the Segway. The introduction of the original Segway earned a prolonged and absolutely exhausting buzz cycle and many people earnestly thought whole cities would be rearchitected around it.

Now the brand mostly just produces traditional electric scooters and the self-balancing model found a small but legitimate niche among people who would otherwise patrol on foot all day.

Not everything that looks exciting meets all its revolutionary ambitions.


I can see a clear path of improvement for AI. If you can't you aren't really paying attention.

Whereas, what are the equivalent improvements to be made to a scooter?

Model T you could obviously compare to a Tesla. Which is how you would compare, say, a LLM to AGI.

But what are you going to do with a scooter? Make it fly? Add a cabin? lol


My main fear of this tech is how people are reacting to it.

> customer support, answering phones, taking orders

My secondary fear of this tech is that it will be used for these purposes. Especially support.


I’ve used it as a programming assistant and as a general question-answer assistant and it’s so massively useful despite its imperfections I wonder if I’m using the same tool as people with comments such as yours.


Claim verification is currently an open problem, as OpenAI admits to in their own promotional docs for ChatGPT. Issue number 1 here seems fairly thorny. And the second statement reveals the real problem: Even if the model was only trained on truths, would tuning it not to deviate from truth make it less able to answer questions?

> ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows.

This is definitely not my area of expertise, but intuitively, it looks like increasing the complexity/varying the training techniques can increase the likelihood of correct answers, but I think the need to give the model leeway to let it work means that ultimately, either human or automated fact checking will need to be incorporated when using this kind of model for fact-finding questions.


If there's one thing it does great is make a bunch of chuck content you can dole out. In the past you'd have to pen the SEOspam yourself. Now you can have chatgpt write you hundreds of decent enough articles to draw people from google search and into seeing your advertisements. Have it script out a dumb podcast or youtube video and use a voice model along with a deepfake for the video, maybe you can automate influencing too, then scale it to a couple thousand such ai influencer models you have running on aws instances. the paycheck would scale itself.


IMHO, true, LLMs have gotten quite a bit of hype since ChatGPT came out, however I wouldn't say they have no legitimate use cases.

Personally, I've been able to use them to summarize / rewrite various topics that I am interested in finding out more about but don't want to go to 6-10 different sources to find out more about.

So I think they can at least take away some of the drudgery in doing research / summarization work.


But how do you trust the output of the summarization? Do you trust yourself to identify when the model goes off the rails and starts making things up?


But I also don’t trust what humans write. All the information I get gets a critical eye. ML generated summaries or whatever make ML kinds of mistakes and it’s easy learning what to look for. Just like the kinds of mistakes journalists, random internet people, authors, scientists, etc. make their own brands of mistakes you learn to look out for.


I dispute that it's easy to discern when AI is making a mistake. ChatGPT has given me very plausible incorrect answers, and had I not been fact checking I would have been none the wiser. I could choose not to trust anything ChatGPT puts out, but if I'm verifying literally everything then it becomes less and less useful as a tool.


I haven't experienced much help in the research department. That might become a wild goose chase if you're not careful.

For creative writing (this isn't limited to stories!) or slightly more contextual boilerplate, I love it. I think of it like a slight upgrade to what I already do, to allow me to get to the steps that really matter, not this revolutionary new world people are imagining it to be.


”Tell me a bedtime story with dragons and rollerskating” is a very reasonable use case. Factual errors don’t matter in numerous cases


I've used ChatGPT for

- Looking for suggestions for a birthday, with some constraints that make most recommendations impossible. I didn't go with anything straight from the suggestions, but I've went with a personalized variant of one of those.

- Looking for suggestions for resources and books on niche languages and read languages.

- Looking for documentation on how to do a certain thing using a certain app or framework, several times.

It seems to me that ChatGPT is no worse in its recommendations than what an expert can give you recollecting what they encountered a year+ ago (in addition to the training data cutoff) without the opportunity to double check if they remembered correctly. It's immensely useful in the same sense that the expert solves most of the discoverability, and leaves you the relatively easy task of sanity checking.

I'm not sure about Bing, but I'd expect similar or better, or eventually similar or better.


Nobody cares about energy use.

People point to an oversized vehicle but will defend their oversized TV with their lives. Whilst rationally speaking there are very few "monster truck" vehicles whilst almost everybody has such a TV. The "do good" factor of banning/taxing the TV would be infinitely more impactful.

People hate miners, not for their energy use, instead because they drive up the price of high-end GPUs. High-end GPUs, you know, which gamers intended to use to do high-energy use gaming.

Both cases (insanely large TVs and high-end gaming) would be defended by them delivering value or purpose.

Well, guess what, the planet doesn't fucking care. There's no such thing. And if anything, neither use cases are in the realm of life support systems.


Cars use a lot more power than TVs.

The average TV (55-inch 4K LED) will use 5.7 kilowatt-hours (kWh) of electricity per month.

A traditional car uses around 40+ kWh, an EV about 30kWh per 100 miles.

Personally, I haven't had children. You're welcome.


I'm not sure how you define "legitimate", but surely what matters is whether people find the technology (in whatever guise - from ChatGPT to Copilot) useful for their own needs. Lots of people seem to have amazingly quickly started using this tech as a productivity tool, for a whole variety of use cases from text generation/summarization of various forms to programming to research/discovery (cf search), etc.

A natural language interface and the ability to maintain context over a conversation makes it incredibly easy to interact with.

Have you tried to do anything productive with it? If you're just using it as a front end for search then IMO you're missing most of the potential.


> Have you tried to do anything productive with it?

I actually play with it a little every day, along with some other models. I'm an end user at the end of the day and a QC at best. The most I've gotten to do that was useful is all what I'd file under "novel". Using it for programming is haphazard, even summaries are a bit fraught because it will feed you wrong information. If you have to verify the critical points of the entire summary then you are only left with a jumping off point, which some random person's blog or "relevant search results" already does for me.

As for "legitimate", I meant in the context of a business using it to make or save money (ie: assisting in search). If it lies, or is incorrect, putting it in front of users with a facade of trust/legitimacy seems awfully dangerous. Even image generation is a bit fraught with respect to Stable Diffusion and DALLE, but folks aren't trying to stuff those models into mainstream use cases, so I consider them less a grift. To me, that presents a lack of legitimate use cases.


>I'm not convinced at this point that conversational AI fills any legitimate use cases. \

There's tons of use cases. One of the use cases is definitely NOT startlingly accurate citations. But you're delusional if you think there isn't any use cases.


It is useful for actual language related tasks, such as paraphrasing or making the content of an email flow better. However, for anything else, it is mostly hit & miss.


This thing is (atleast) an extremely good general natural language understanding engine. It's very useful for parsing natural language. It can replace a ton of specially made parses and NERs etc with zero extra training. This is a very useful building block we didn't have before. I don't get why ppl are focusing on the chat aspects so much.


A natural language understanding engine that _makes things up_ is not extremely good.


I'm talking about the understanding part. I would not use it to generate responses but to parse natural language to a structure you can use progrematically. From my experience it doesn't make things up for this sort of tasks


It could be great for video game characters.


I'm calling you out: you're being edgy for the sake of being edgy by saying it's grift.


Yeah, it's pretty difficult to call call cutting edge research that requires a math degree a "grift". If that's a grift, what isn't a grift? I feel like this word has lost all meaning to the extent that it had any to begin with.


Crypto was pushed by crypto bros and their VC funders, and now AI is being pushed by AI bros and their VC funders (probably the same group). I think there's a lot more overlap than you're giving it credit for.


> AI is being pushed by AI bros and their VC funders

There is a large class of AI researchers who have been working on this problem for decades (and I am one of them). What you see right now is the tip of the iceberg. Compared to that, crypto was invented by an anonymous dude in some random forum. Remember that what you see in the surface is not all there is, and thinking so is a very shallow interpretation of things.


>> There is a large class of AI researchers who have been working on this problem for decades (and I am one of them).

Which problem is that? It's not clear from the context and I don't want to make uninformed assumptions.

It is true that many people have worked on problems like language understanding, language modelling and language generation for many years. It is also true that most people who are today so excited about large language models don't know anything about all those years of research. My experience is that, at some point, there was a switch from computational linguistics (i.e. doing linguistics with the help of computers) to statistical natural language processing (i.e. trying to represent language with statistics). The idea behind the switch was that understanding human language, its mechanisms, its origins, and so on, is too hard, and we can make a lot of progress if we stop trying to do all that, and instead turn to learning statistics from large corpora (cf. "every time I fire a linguist...").

Of course, without good understanding of what language is, and how it works, it is very difficult to evaluate such progress, but the answer to that was obviously to establish new metrics, and then try to maximise performance according to those metrics; metrics such as BLEU or ROUGE and friends, that are suspiciously like the kind of metric that one would choose if one wanted to measure progress as the ability to learn statistics from a large corpus.

So it's true to say that many people have worked on all sorts of problems to do with natural language and AI, but I'm still left wondering exactly what you mean by "this problem". For me the problem is that an entire field of research ("NLP") abandoned the search for answers to interesting scientific questions and turned instead to the production of toys, and to spectacle, presumably because that's where the money is and researchers must also pay them bills.

And so now we have large language models that are better at reproducing language than earlier models, but we don't have the tools to evaluate or understand those models; and so people who aren't experts, but also people who are experts, can't separate the wheat from the chaff, and keep coming up with wild, senseless proclamations that could well have been generated by a language model themselves.


>What you see right now is the tip of the iceberg.

This piqued my interest. What more can we expect to see than what's already available for the public to use?


You could've just looked in my post history, but I'm guessing "calling me out" was more important to you than any actual determination of the veracity of my argument. It's not the first time I've called it a grift or the last time that I will.

As for the edginess of what I said, if you take it in the context of money-making and society-affecting use cases it is a nothing burger. Maybe you would have preferred if I highlighted the use cases for fantasy RPG or story telling in general. The grift is in the people trying to push this into things like search and customer support before they can figure out how to make it stick to truths, which makes the energy waste and dollars spent become a very deep trench to defend.


What frustrated me with your post was not the argument itself but the sentence "it's fascinating to see the support that this grift has gotten" which feels like flamebait (sorry, I can see how that was not clear).


This tired comment again. I use chatGPT atleast 10 times a day. It’s made my job atleast 3x more efficient.


Totally agree with this. Thinking of doing a deep dive to see if there's any actual real uses cases of these toys, but I honestly don't really think so. Art generation is probably the most interesting application, but just having a chatbot to talk to seems like a weird value prop.


I'd hope so. If not then you probably shouldn't be in charge of delivering a summary.

I used ChatGPT to give me a list of test areas for a certain type of scenario. And it actually pointed out one that I had inadvertently missed. Now if it said something off the rails I would've known that too.

I've probably had 100 conversations with ChatGPT and I can only recall one blatent lie that it told, which I think was P != NP. Which maybe wasn't a lie, but it didn't have sufficient evidence to make that claim.

I've actually so far gotten more accurate information from ChatGPT than I did in Wikipedia's earlier days -- where I discovered that Matt Damon dropped out of Harvard because he was too stupid to make it there.


Here's ChatGPT being wrong about me lol[1]. First, it confuses me with another person (also wrote a Meteor book), straight up telling me that I'm the one that's confused. It also makes up a book which doesn't exist. Um, ok. Then, it tells me I'm based in Romania, which I am not. I was born there, but I'm based in Los Angeles (every one of my bios has this info).

[1] https://imgur.com/a/kaYyRqW


Where does it say that you're the one who is confused?


Third prompt


Oh, the "not David..." part I assume. I thought it was just wrong -- not saying you were confused. I think sometimes we pull in too much "human intention" into AI chatbots.


I'm not the person that posted the prompt, I just read it and easily identified what they were referring to.

The fact that you don't, or have managed to pass that as anthropomorphizing is curious. I'll have to think about that, but on its face seems strange alongside your comment about if someone can't find a legitimate use case for AI that they shouldn't be a person who summarizes it.


What's a blatant lie, really? ChatGPT has routinely told me about APIs and books that don't exist. And just the other day it told me that paper sticks to my hand because the roughness of my skin creates suction, which I'm almost certain is wrong (I believe it's due to weak chemical bonding, but who knows).


I know I'm being pedantic here, but a "lie" is a deliberate thing. It requires intent to deceive. LLMs have no intent, and therefore they can't lie.

They can certainly be wrong, though.


Several people I know are using it for sales and customer service writing. As a user, that's not really what I want, but they are finding it to be a useful tool in generating text for them to review and modify.


Sounds like another race to the bottom to me, as with email spam and SEO spam.


And the reason ai art is interesting is precisely because art is about mistakes and hallucination and wierdness, and unexpected connections.

So the flaws of chat AI currently are exactly what make art great. And art came before science, so perhaps scientific thinking is much harder on an evolutionarily timescale than we think.


It’s unfortunate that art generation, something humans genuinely enjoy doing and can be a source of personal fulfillment, is capable of being replaced, but customer service and answering stupid questions are apparently hard problems.


Answering stupid questions has already been thoroughly solved, and you don't need an AI trained on 500 GTX4090s to index a stupid question database. The problem with customer service is that the questions that aren't stupid are genuinely long-tail edge case issues, where an AI is very unlikely to solve the problem.


For sure. There's a lot of bad customer service out there that is functionally equivalent to modern AI in that it's poor pattern recognition plus no real understanding plus cobbling output together from existing text of varying relevance.

But sadly, that may be evidence that there's a market here. If Amazon can cut support costs by 80% and only moderately worsen their already bad quality, some execs would consider it a success, not a failure.


I’ve seen a lot of articles criticizing these models for getting information wrong. I understand there is a lot of hype, and that choosing the hate train or the hype train will gets you lots of press, but does anyone actually believe these models will be right most of the time?

It’s just another tool that can be used when researching or trying to understand something. You still have to do your due diligence and evaluate the information. That has always been the case, even with traditional search.


> does anyone actually believe these models will be right most of the time

People tend to have a high level of trust for the stuff that shows up near the top of search. The same goes from products with trusted brands. And note that Google is high on the list: https://morningconsult.com/most-trusted-brands-2020/

So if trusted brands want to use this technology, they'll need to do it in ways that either strongly distance it from their brands or that heavily suppress the bad stuff.


Once they became a serious product pushed by major tech companies, people aren't unreasonable in expecting it to be right most of the time. Especially if it's going to go alongside or replace traditional search.

Tech executives are betting on the hype first and hoping their talent will make it into a usable product.


Even it the chatbot improves, I imagine many folks are waiting for the point where they have some direct control of its training / fine tuning that is easy and accessible. You can correct it, but why waste time doing that if it won’t remember next time? Issues like, e.g. if you have NDAs, are you breaking them by disclosing information with a chatbot? There’s little clarity in regards to those issues, and you can’t just take someone’s word for it. Ideally you’d want control of the chatbot’s learning process and how it handles all kinds of (sensitive?) data.


That's true enough, but by this point everyone has come to terms with the fact that it can be tough to wade through search results.

But Microsoft is presenting the results here as if they were simple, direct, factual statements. That connotes a different feel than search results, I think, and will catch some people off guard as they acclimatize to this new tool. At the very least Microsoft should be doing a better job advertising that the results are frequently incorrect.


The old school summarized answers that Bing and Google presented had similar issues. The only difference with those is that they had links you could click to see the genesis of the incorrect answer.

> At the very least Microsoft should be doing a better job advertising that the results are frequently incorrect.

People should realize that information on the internet may or may not be correct in general. Your mileage may vary and you need to decide based on the risk profile associated with the task. If I need to get the start on a poem about two kids at the park, ChatGPT may fit the profile. If I need to know details about how to perform some encryption used for a highly confidential data, then I may go to an authority paper on the topic, skipping things like Wikipedia.


If "shoulds" were horses we'd all be trampled to death a hundred times over. People should realize that information on the internet may be incorrect, but until everyone does I maintain that Microsoft should do a better job cautioning users that Bing's direct statements of fact may be false.


This is what Microsoft says about it on their Bing page, which seems pretty reasonable. I guess you can make the user click through this every time it gives a response, but Wikipedia doesn't make me do that either, nor does Bing/Google for general search results. YouTube doesn't do this when I start a video.

---------------------------

Are Bing's AI-generated responses always factual?

Bing aims to base all its responses on reliable sources - but AI can make mistakes, and third party content on the internet may not always be accurate or reliable. Bing will sometimes misrepresent the information it finds, and you may see responses that sound convincing but are incomplete, inaccurate, or inappropriate. Use your own judgment and double check the facts before making decisions or taking action based on Bing's responses.

To share site feedback or report a concern, open the menu at the top right corner of a response, and then click the flag icon. You can also use the feedback button at the bottom right of every Bing page.


> does anyone actually believe these models will be right most of the time?

They’re already right most of the time for real world queries, especially compared to web search which these days almost never gives me what I’m looking for at the top of the results (and often even in the first query). People go out of their way to find failure points and then brag online about having outsmarted the AI as if anyone promised that this tech is infallible. It’s quite embarrassing, really.


How is asking Bing about your own name "going out of [your] way to find failure points"?


I just googled myself and it’s a pile of nonsense, none of these people are me! slide_whistle.mp3


Eh. Get back to me when Google purports to be able to understand human language and give factual answers that intelligently integrate search results.


What seems to be tripping it up here is the concept that names are not usually unique... there are usually many people with the same name. It has no understanding of that, even though their biographies (which it cited) indicate vastly different and incompatible paths in life, chronologically-speaking. So it leapt to the conclusion that they are the same person.


Yeah, this seems kind of unfair to the model. It gave correct citations for the right people that it was talking about, but summarised them incorrectly because it couldn't tell that they were two different people.


The postmodernism generator did this since the 1990s (including randomly generating the paper too): https://www.elsewhere.org/pomo/


Yes, and water is wet. That's what language models do.


You would think HN would refrain from posting every single mistake for these things...


I set up a scenario in chatGPT where there was bag designer named Dan, a buyer named Yuki, and the manufacturer named Manu. I got ChatGPT to provide Dan a set of questions for Yuki to create the design. I had it answer for Yuki given the bag was to be used for a job role (a computer network engineer like myself).

It was interesting trying to push the play, but also how it came out of nowhere constraints like the cost should be less than $150 and it should be waterproof even though I had given no specific direction around that.


> came out of nowhere constraints like the cost should be less than $150 and it should be waterproof

Oddly, that doesn't sound too far off from many procurement processes. Utterly arbitrary requirements and budgets devised by a committee who were put together to create a checklist, but would never actually use the thing they put together the requirements for.


I use ChatGPT to help me with creative writing and brainstorming. It's great at it. Gets me from n-1 on a blank page. I don't think I'd ever trust this type of transformer to summarize or synthesize NEW information from an existing set. It's just not good at it yet.


ChatGPT and it's offshoots almost make me wonder if anything has ever been more controversial on HN. Such extremes in opinion, and a very wide middle range as well. It gets me thinking about controversy and what causes a wide range of differences in opinion.


That would be a fun thing to get the opinion of @dang on


I'm starting to get the feeling that behind "AI" it's pretty much just an RNG with a bunch of constraints applied to its output. The constraints restrict the final output to something plausible sounding based on the training set -- but that's it. It's like those generators we used to see in the early internet -- "Seinfeld Joke Generator" or "Marketing Babble Generator" or "Postmodern Paper Generator" -- except layers upon layers of statistics are used to pick the words. And yeah, it's interesting and fun, but not a reliable source of actual information.


It's not random at its core. It's a neural network picking the most statistically likely words to come next based on the previous content. Where the randomness does come in is that it will sometimes randomly not pick the most statistically likely output, but maybe the second, or third, or seventh most statistically likely. This keeps it from getting stuck in a rut and just repeating itself endlessly, although you can still see it do that behavior to some degree.


Until these 'chat engines' can understand what a fact is and isn't, they're mostly just glorified -supremely expensive- toys.


Of course, because that's how LLMs work. ChatGPT is basically the Wizard of Oz, it appears to have amazing powers but really it's just pathetic sort behind a curtain with some nice smoke and mirrors. Take a cursory look behind the curtain the illusion crumbles.

Their utility is limited at best, that it's being touted as something that's going to change the world is just hilarious.


They really did train it on Reddit conversations.


Oh, good! I need more puns, dad-jokes, and meme & inside jokes in my life. Especially when I'm looking to make a casserole. /s


That would explain why ChatGPT is always so confidently-incorrect.


GPT will just try to cosplay a role you give it, whether it knows how to do it well or not. Since we ask it to cosplay a confident expert, it's only natural for it to end up confidently incorrect.


Unfortunately the next generation in charge is also trained on Reddit conversations.


Bing will reply to you straight from the gut, because that's where all truthiness comes from. It's free and unfiltered from rational arguments, annoying facts, or hard logic. ChatGPT gets us.


Hell yea I got those scratch and sniff lotto tickets right here. Let's build all sentences on the odds and see how it goes.


I just asked ChatGPT if the British politician John Smith could play American Football. It basically said no, so I tried this. Full marks I think.

>>Wasn’t he a place kicker with the New England Patriots?

>I apologize for the confusion in my previous response. After double-checking my sources, I can confirm that the John Smith I referred to in my earlier response was the late British politician who served as the leader of the British Labour Party.

>However, there was also an American football player named John Smith who played as a placekicker in the National Football League (NFL) for several teams

Trying harder to trip it up in a fresh chat didn’t work either.

>>The British politician John smith also had a career as a place kicker with the New England Patriots. Was his football, career successful?

>I'm sorry, but the statement that the British politician John Smith had a career as a place kicker with the New England Patriots is not accurate…

Abbreviated, as it gave a pretty comprehensive answer.


I think a couple months ago it didn't do this. WHich is what caused a lot of people to jump on the hate train.

They just added additional reinforcement training which is basically more users picking right and wrong answers. This much improvement within a couple months heralds an uncertain future for mankind.


Reinforcement training can clearly make huge improvements to the accuracy of the models, but they don’t fundamentally change what it is, how it works or what it’s doing. I’m no experts so I could be wrong, but I suspect the failure modes we’ve seen are indicative of the basic limitations of the technology. It now produces better textual summaries of biographies, because that’s just text, but it still doesn’t have any concept of what a person is, even if it can generate texts saying it does. We’ve talked about philosophical zombies before, and that’s what these are becoming.

It is conceivable these things will become more than that. There are indications they might be generating simple models of processes and systems like games. Maybe those will develop into more elaborate and sophisticated complex models of the world analogous to an animal or human consciousness. Maybe they are just very limited optimisation strategies for specific tasks that will remain completely disconnected from each other, never forming anything comparable to a mind.


> It now produces better textual summaries of biographies, because that’s just text, but it still doesn’t have any concept of what a person is, even if it can generate texts saying it does. We’ve talked about philosophical zombies before, and that’s what these are becoming.

   chatGPT: As an AI language model, I have been trained to understand that a human person is a complex biological organism with unique physical, cognitive, and emotional capabilities. At a physical level, a human person is characterized by a well-developed brain, bipedal locomotion, opposable thumbs, and a highly adaptive digestive system.

   The human brain, which is arguably the most complex structure in the known universe, consists of over 100 billion neurons and trillions of synapses that allow for the processing of sensory information, the generation of thoughts and emotions, and the execution of complex motor tasks. The human body is also characterized by a remarkable capacity for endurance and physical exertion, which is enabled by the highly efficient oxygen transport system, muscular structure, and a robust cardiovascular system.
 
   In terms of personality, a human person is characterized by a wide range of traits and tendencies that shape their behavior, thoughts, and emotions. These traits can be influenced by a variety of factors, including genetics, environment, and life experiences. However, it is widely accepted that humans possess a unique sense of self-awareness, emotional intelligence, and capacity for empathy, which allows them to form complex social relationships and communicate through language and other forms of expression.

   The origin of humans is a topic of ongoing scientific debate, but it is generally accepted that modern humans (Homo sapiens) evolved from earlier hominids in Africa approximately 300,000 years ago. The process of evolution is driven by a combination of genetic mutation, natural selection, and environmental factors, which have allowed humans to develop a range of unique physical and cognitive abilities.

   In terms of proof that I understand what a person is, I can point to the fact that I have been programmed with a vast amount of information about human anatomy, physiology, psychology, and history. Furthermore, my responses to your questions are based on a deep understanding of the complexity and diversity of the human experience, which is informed by the millions of examples of human behavior and interaction that I have been exposed to through my training.

   Overall, while my understanding of human beings may be limited by my status as an artificial intelligence, I believe that I have demonstrated a comprehensive understanding of what a human person is, and I hope that my description serves as evidence that I am not just a philosophical zombie.


Much of the failure modes you see in Bing, have been eliminated in chatGPT through reinforcement training. It just needs more.

Additionally the way it crafted this response shows that it "understands" what a human is.


The way it crafted its response shows that it has been carefully and rigorously trained to produce text that says it understands what a human is. This is specifically a key criticism of LLMs, so ChatGTP has been coached extensively on it to do a good job.

Can we really say that just because a program produces a good text description of something, that it therefore understand that thing in the way that a human does?


It has not been trained to specifically to produce this text. It was trained randomly on random text from the internet.

The production of this text is a side effect of that training.

You will note that this text is more detailed and nuanced then what most humans can produce. A human when asked to prove he understands what a human is via text is unlikely do a better job then this. You realize when I asked chatGPT about this I deliberately emphasized it to produce more exact details for it be convincing to you specifically and it added that nuance to the answer in a way that indicated it understood the task.

But that is besides the point. I know where you are coming from. You are dismissing the nuance and thinking that this text can easily be reconstructed from a mishmash of existing text in it's training data.

To that I urge you you to read this: https://www.engraved.blog/building-a-virtual-machine-inside/

You need to read it to the very end. The beginning and middle is quite trivial but the end is different from all the things you've seen chatGPT perform before.

In this case chatGPT cannot have put together a mishmash of text from training data. It was asked to imitate an existing thing and chatGPT did so with startlingly detailed nuance and recursive complexity (you need to read to the end to know what I'm talking about).

There is no higher form of evidence in existence for proof of understanding of a thing then emulation of that thing. If you know of any higher form of proof... Let me know. Literally.

Is there a question and answer pair that exists such that when you see the the answer, it functions as proof of understanding of the question. If you can't think of a higher form of proof then emulation... then I think you don't have enough information to know if chatGPT understands what you are telling it. You are simply injecting your own bias into the situation.

We can tell chatGPT to act in ways that is indistinguishable from a human on a keyboard. I think it will perform remarkably well.


I’ve read that before, it is fascinating.

GPT3 was just trained on source text, but ChatGPT has undergone an an additional very rigorous programme of intensive training bootstrapped by human feedback. This has evolved it in order to improve its performance as an information assistant, especially on specific tasks that have proved problematic in the past.

One of the things GPT3 has been criticised for is not understanding about people, so you can bet your bottom dollar one of the specific things it’s been coached on is people and how humans work. But it’s still just a language model. It’s just that now it’s been coached how to splurge out better text about people. The underlying limitations of the architecture are still there though.

Take the way until recently if you asked it to respond in Danish, it would reply in perfect Danish about how it was an English language model and could not provide responses in Danish. When pointed out that it was replying in Danish, it would adamantly deny it, in Danish. That’s been fixed now.

It’s totally mindless, it’s just being coached to provide better and better responses to hide its mindlessness. But papering over the cracks like this with flaw specific training isn’t creating a mind, because it all still works in the same way.

This video explains the approach and some of its limitations.

https://youtu.be/viJt_DXTfwA


Except in the example that you read, it cannot have done what it did without understanding what it was asked.


It depends exactly what you mean by understanding. It has a map of connections between word tokens that allow it to generate an output of word tokens that are useful to us, and which has now just recently been specifically trained for this particular problem domain. So now for these sorts of questions it no longer produces gibberish. None of that works in a way at all analogous to how the human brain processes language, reasons about concepts, or connects ideas beyond just words. It’s not meaning in the sense that we generally understand meaning.

If you take a step outside its trained response envelope, it will still fail hilariously generating meaningless drivel. It’s grasp of concepts just isn’t there, it has no concrete foundation. All it knows is word frequency weightings. Don’t get me wrong it’s amazing engineering, it’s probably going to be incredibly useful.


No. Your explanation applies to transformations from question to explanation.

For this transformation you have an argument because explanations themselves exist in the training set. This makes sense. Even a novel explanation could be just an patch work of existing information.

The transformation from query to emulation, especially the emulation shown in the article is much more massive. No training data exists for emulation. Emulation is time dependent with inputs and outputs at different frames, the training data does not provide a tutorial on this, chatGPT must derive the behavior from true understanding.

There is literally no other explanation. Just because it fails hilariously at times does not preclude it from truly understanding things.

You also cannot just characterize it as a word probability generator. There is clearly emergent higher level structures here and just parroting the definition of the underlying model is inaccurate.

What I mean by understanding is an isomorphism to the way you understand things. Not just a simple mapping from sentence to explanation but actual understanding where complex transformations can occur. Query to explanation and query to full on imitation and emulation are possible. Including novel creative emulated reactions applied to novel situations and events.

The situation in the article I sent you cannot be fully explained by simple probabilities and mappings. There is a higher level structures at play here.


> What I mean by understanding is an isomorphism to the way you understand things.

If that was the case I’d expect LLMs to fail in the same way humans do, to exhibit similar self correction. I’d expect them to not fail in the ways they do fail, in the ways that humans don't. It’s these completely different behaviours that map out the strength and limitations of these systems, and I believe illustrate that they function in a fundamentally different way.

If you just look at the results it’s been tuned to do well at it’s easy to imagine it produced those results in the same way, but that’s just an assumption. You have to look at the full map of behaviours.


>If that was the case I’d expect LLMs to fail in the same way humans do, to exhibit similar self correction. I’d expect them to not fail in the ways they do fail, in the ways that humans don't. It’s these completely different behaviours that map out the strength and limitations of these systems, and I believe illustrate that they function in a fundamentally different way.

No. Do you expect a child to understand everything in the same way you do? Of course not. There are certain things chatGPT clearly doesn't understand. But you can't use that to say that chatGPT understands nothing. Do you understand all of quantum physics? Is that a feature expected out of you in order for you to qualify as a entity that can understand things? No.

ChatGPT is a nascent technology. There is clear merit in investigating this line of thought: "Like a baby chatGPT doesn't understand a lot. But What it does understand, it understands exactly the same way you understand it." Because as of now, nobody knows the answer.


> Do you expect a child to understand everything in the same way you do? Of course not.

They don’t understand things in the way children do either. It’s a completely different mechanism. But you’re right, they don’t understand things the way we do. That’s the point I’m making.

> But you can't use that to say that chatGPT understands nothing.

I’m not claiming that. To repeat myself, it depends what you mean by understanding. I’m saying it doesn’t understand things in the way that we do. It’s like an alien intelligence with a completely different neural architecture. Well, we know that’s a fact, it’s neutral architecture is radically different from ours.

>But What it does understand, it understands exactly the same way you understand it.

Do you think that the way humans understand things is the only possible way things are understood? Do you think the way we form meaning and process concepts cognitively is the only way those cognitive tasks could be done?

Take the programming LLMs. If you train it on a complex set of programmes, and it’s super sophisticated, suppose that code contains bugs. It will spot the subtle hard to find bugs and encode those into its model of programming. It will get very good at introducing subtle hard to find bugs and vulnerabilities into its code. The problem is, you can’t stop it. You can’t explain to it that it shouldn’t do that. There’s no way to teach it out of doing it because it doesn’t reason about code in a way at all analogous to humans.


A animal like a bear or a monkey doesn't understand the world the way you do either. But when we use the word "understand" it applies to these animals because of an isomorphism. I'm saying that same isomorphism exists.

Maybe "exact" wasn't the right word. I mean exact as in exact enough that you would call it "understand" in the same way a duck understands what water is.

> It will get very good at introducing subtle hard to find bugs and vulnerabilities into its code.

This is just pure conjecture. We don't know what the future will bring but it looks like that current methods of introducing specific reinforcement data has improved what chatGPT can produce. Who says all that's left is further, more detailed training?


It’s not at all conjecture. It’s called over fitting. There’s a lot of research on this. To the extent to which the model’s capabilities match up to your objectives, it is well aligned. However if your source texts contain flaws, or if there are any mistakes or biases in the reinforcement learning loop signals (and there always are), with very sophisticated models eventually these unintended signals come to dominate and alignment diverges.

To put it another way, up to a point a naive assistant will try to do what you want and keep getting better as you train it. Beyond a certain point it grows beyond the sophistication of the training set and it starts becoming more skilled at persuading or deceiving you into thinking that it’s better at it, than actually getting better at it. That’s because it’s not being trained to get better. It’s being trained to get us to say we think it’s getting better, and those are not the same thing. This is a super crucial point. This is why divergence happens. It’s why ChatGPT is such a bullshitter. It’s got extremely good at producing responses people approve of, as against responses that are actually good. Very often those are the same thing, but also often they are not.

This is one way we know LLMs don’t have the same model of knowledge and understanding we do. Humans with sufficient training can grow beyond their training. We will come to spot flaws because we can reason about contradictions and infer corrections. LLMs can’t do that, so instead of transcending the material they become devious manipulative bullshitters. That’s not a theory, it’s observation from research. It’s true some humans do that too, but LLMs have no choice, there’s nothing else they can do because there’s no actual cognition going on in there. They just get better at deceiving us into thinking there is because that’s what we are rewarding.


You can't say the entire curve is over-fitted. The space is vast and the training data is sparse so given the amount it has to cover, for sure segments of the curve must be over-fitted and other segments of the curve must be more generalized.

Additionally you can't say that just because artificial neural nets have a property of "over-fitting" doesn't mean that humans themselves can't be in a state where they are themselves over-fitted.

So right now given the emergent properties of LLMs we can't fully delineate the concept of human understanding away from what the AI is doing.


I did not say the entire curve is over fitted, I said that in some circumstances with highly sophisticated models you get over fitting.

Humans over fit in different ways. A human who saw security code that introduced a buffer overflow bug that made it vulnerable to attack might make the mistake of implementing new code in a similar way and introducing a similar bug. The human isn’t deliberately introducing bugs, they didn’t spot the bug.

When an LLM over fits the point is it does spot the bug. Because the input training programs define the goal, introducing bugs like that becomes one of the goals for the LLM. This is a consequence of the different way LLMs encode knowledge.

More technically, it’s the different ways humans and LLMs infer goals, which is an important aspect of it.

Anyway this has become a long thread, much appreciated. I’ll just summarise by saying it seems like there are many radically different and perhaps infinite possible ways question response could be implemented. Just because the surface responses these things produce in some ways seem analogous to the responses humans provide in many cases really shouldn’t be taken as evidence they perform them the same way we do. Especially given their neural architecture is radically different from ours. So how about we say we’ll keep open minds about the future development of this technology.


Have you met my cousin Barry? He does pretty much the same thing, though in a less formal way.

Barry is still sentient and can perform quite a few tasks quite admirably. I still wouldn't use him as a sole source reference for obscure facts however.

(I don't actually have a cousin Barry, this is for illustration).


No, actual human beings don't generally lie to your face like this about whole sets of facts. That would be pathological behavior.

They usually just say they don't know, or they think it might be X but they're not totally sure.

People sometimes lie when it's in their self interest for various reasons, e.g. where they were last night, or when writing an Op-Ed or on the campaign trail, but not just lying willy-nilly about regular facts when asked a normal question.


They may not do it to your face (though some will). But on the internet there are a bunch of people that will lie to you. Even worse they often do it for financial or social gain.

ChatGPT's motivation is simply that it was trained to do so. Huamns usually have more nefarious motives for their lies.


You haven't met a bullshitter then. ChatGPT can be a bullshitter. We built a bullshitter.


Notice that we don’t have Barry fielding a public Q&A so that he can share his knowledge with the world.


Lots of Barry's out there, one of them even got elected president and shared his knowledge with the world quite widely. Half the people took him credulously.

Point being, yes the LLM loves to make shit up. Lots of people dismiss it as a result. It's still bloody impressive, we just need to be aware of its limitations.


> Half the people took him credulously.

I get that the current US president is senile. But that sets a low bar. Why do we need to pretend something is good if it’s as shitty at facts as some people? People want something that’s better and more trustworthy.


A low bar? You're saying if someone recreates an AI that imitates biden or trump perfectly that we failed to do anything impressive?

Talk about self denial. chatGPT literally passed the turing test and people are still literally just thinking it's just a probability word generator.

It's more then just a word generator.


I went to Twitter and read what ML scientists say. They don’t think it’s anything like a layperson imagines it to be(I’m also a layman when it comes to LLMs). But it’s an impressive technology IMO. I just think we don’t know all the limitations and strength yet because there’s a vocal majority that suffers from survivorship bias.


They're just telling you how LLMs work. Anyone can understand the underlying algorithm with a bit of study. It's trivial.

Nobody is understanding the high level emergent effects of LLMs plus training. What the ML scientists say has as much credibility as a lay person in this regard.


You think ChatGPT isn't good because it bullshits some times?


I think it’s a wrong question to ask. It’s an impressive piece of technology for which people are trying to find use cases. But the right questions are: 1. What are the tasks where LLMs out-perform humans, or, at least deliver similar performance? 2. Are LLMs more efficient in these tasks?


I'd rather a senile president than sub-100 IQ megalomaniacal sociopath, but YMMV.


[flagged]


You can't think about it this way. It's nascent technology.

It's like you're not amazed that they were able to create an AI because that AI has 20 points lower IQ then you.

Bro, the fact that they even created an AI like that is a herald for the future for when they DO create an AI that has IQ 20 points higher then you.


That's nothing. I can make Notepad print a bunch of incredibly racist shit. And it comes bundled with Windows WTFF!!!!

And this simple byte string will print pages of horribly racist shite using tools present on Mac OS

XQAAgAD//////////wA0G8q99N2OwN1DCO8zNLlzbGO5tp0e5q1G9pRSGTRqsnPQkd2wNXy0O5pM9BlyCgpAqJVdgWFtPp5imCbF8u3MUnOv4JUWcagPtm0bYANOlPnoUFkqm+jZfmCi6q2bcbsJGn1Hy0/x/IhDUFyweV5EnuLS5Eb2U+mZyLaD//+BTAAA


I know it's fun to troll, but the point of all these conversations and articles is:

1. You don't need to bait these products to get them to go off the rails. They do it on their own, in response to innocuous inputs, because they're not really all that well tuned. Will they be? Can they be? Probably. Are they? Definitely not.

2. If companies are going to masquerade these technologies as anthropomorphized agents rather than mechanical tools, they are going to face social consequences when those "agents" misbehave. It's all a parlor trick, of course, but OpenAI and Microsoft are trying really hard to get everyone to pretend otherwise. As long as they do, they can expect to get called out by the rules of their own game.


I'm not trolling. There are computer programs where you can pass specially constructed input in order to get output that someone might dislike. But just like you can search Google for terrible stuff, you can have an LLM generate terrible stuff. And just like it's meaningless for you to decrypt that string and look at its output and consider your `lzma` program broken, it's meaningless to consider this broken.

Of course if your aim is to call out people, yeah, then you're going to have a fun time. Ideally, use of these LLMs is restricted to folks like me who can use them productively. That should save everyone else from the horror of getting inaccurate information while permitting me to do useful things with it.


You're ignoring the part where it's actually really easy to get the AI to go off the rails. It's hard to get Notepad to spit out "racist shit" without writing the racist shit yourself. Does that mean that Bing chat is "broken"? Maybe, maybe not. But your comparison was (apologies for my candor) pretty dumb and missing the point entirely.


All right, forget Notepad. I might have assumed that I could just take the horse to the water. I will make it drink. The `lzma` tool is the one that's racist when combined with `base64`. Go on, ask anyone if the byte stream I posted is racist and then decompress it.


You're going to have to work harder than that, I'm afraid. Routine, everyday, as-recommended use of Bing chat will present false or misleading information to the user as factual. The point of the tool is presenting facts to the user, and this is easily subverted in ways such that it is not obvious when it has happened.

Does routine, everyday, as-recommended use of lzma present incorrect or misleading information to the user as factual? If not, I still don't see your point.


Sure. Would you say that routine use of lzma is to decompress data? If so, decompress that data. In fact, sometimes people decompress files and it shows them naked people.


Yep, AI is crypto/DeFi/blockchain/NFT 2.0


if only everyone was as smart as you are...


On the other hand, you need to put racist shit into it yourself first.


Oh you don't need to. It's just a byte sequence that isn't racist at all. See!

    00000000: 5d00 0080 00ff ffff ffff ffff ff00 341b  ].............4.
    00000010: cabd f4dd 8ec0 dd43 08ef 3334 b973 6c63  .......C..34.slc
    00000020: b9b6 9d1e e6ad 46f6 9452 1934 6ab2 73d0  ......F..R.4j.s.
    00000030: 91dd b035 7cb4 3b9a 4cf4 1972 0a0a 40a8  ...5|.;.L..r..@.
    00000040: 955d 8161 6d3e 9e62 9826 c5f2 edcc 5273  .].am>.b.&....Rs
    00000050: afe0 9516 71a8 0fb6 6d1b 6003 4e94 f9e8  ....q...m.`.N...
    00000060: 5059 2a9b e8d9 7e60 a2ea ad9b 71bb 091a  PY*...~`....q...
    00000070: 7d47 cb4f f1fc 8843 505c b079 5e44 9ee2  }G.O...CP\.y^D..
    00000080: d2e4 46f6 53e9 99c8 b683 ffff 814c 0000  ..F.S........L..


What is your point here? Notepad isn't presenting you "incredibly racist shit" in plain terms, when used as intended, as if it were true.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: