Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The headline question here alone gets at what is the biggest widespread misunderstanding of LLMs, which causes people to systematically doubt and underestimate their ability to exhibit real creativity and understanding based problem solving.

At it's core an LLM is a sort of "situation specific simulation engine." You setup a scenario, and it then plays it out with it's own internal model of the situation, trained on predicting text in a huge variety of situations. This includes accurate real world models of, e.g. physical systems and processes, that are not going to be accessed or used by all prompts, that don't correctly instruct it to do so.

At its core increasingly accurate prediction of text, that is accurately describing a time series of real world phenomena, requires an increasingly accurate and general model of the real world. There is no sense in which there is a simpler way to accurately predict text that represents real world phenomena in cross validation, without actually understanding and modeling the underlying processes generating those outcomes represented in the text.

Much of the training text is real humans talking about things they don't understand deeply, and saying things that are wrong or misleading. The model will fundamentally simulate these type of situations it was trained to simulate reliably, which includes frequently (for lack of a better word) answering things "wrong" or "badly" "on purpose" - even when it actually contains an accurate heuristic model of the underlying process, it will still, faithfully according to the training data, often report something else instead.

This can largely be mitigated with more careful and specific prompting of what exactly you are asking it to simulate. If you don't specify, there will be a high frequency of accurately simulating uninformed idiots, as occur in much of the text on the internet.



> At it's core an LLM is a sort of "situation specific simulation engine."

"Sort of" is doing Sisisyphian levels of heavy lifting here. LLMs are statistical models trained on vast amounts of symbols to predict the most likely next symbol, given a sequence of previous symbols. LLMs may appear to exhibit "real creativity", "understand" problem solving (or anything else), or serve as "simulation engines", but it's important to understand that they don't currently do any of those things.


I'm not sure if you read the entirety of my comment? Increasingly accurately predicting the next symbol given a sequence of previous symbols, when the symbols represent a time series of real world events, requires increasingly accurately modeling- aka understanding- the real world processes that lead to the events described in them. There is provably no shortcut there- per Solomonoff's theory of inductive inference.

It is a misunderstanding to think of them as fundamentally separate and mutually exclusive, and believing that to be true makes people convince themselves that they cannot possibly ever do things which they can already provably do.

Noam Chomsky (embarrassingly) wrote a NYT article on how LLMs could never, with any amount of improvements be able to answer certain classes of questions - even in principle. This was days before GPT-4 came out, and it could indeed correctly answer the examples he said could not be ever answered- and any imaginable variants thereof.

Receiving symbols and predicting the next one is simply a way of framing input and output that enables training and testing- but doesn't specify or imply any particular method of predicting the symbols, or any particular level of correct modeling or understanding of the underlying process generating the symbols. We are both doing exactly that right now, by talking online.


> I'm not sure if you read the entirety of my comment?

I did, and I tried my best to avoid imposing preconceived notions while reading. You seem to be equating "being able to predict the next symbol in a sequence" with "possessing a deep causal understanding of the real-world processes that generated that sequence", and if that's an inaccurate way to characterize your beliefs I welcome that feedback.

Before you judge my lack of faith too harshly, I am a fan of LLMs, and I find this kind of anthropomorphism even among technical people who understand the mechanics of how LLMs work super-interesting. I just don't know that it bodes well for how this boom ends.


> You seem to be equating "being able to predict the next symbol in a sequence" with "possessing a deep causal understanding of the real-world processes that generated that sequence"

More or less, but to be more specific I would say that increasingly accurately predicting the next symbols in a massive set of diverse sequences, which explain a huge diversity of real world events described in sequential order, requires increasingly accurate models of the underlying processes of said events. When constrained with a lot of diversity and a small model size, it must eventually become something of a general world model.

I am not understanding why you would see that as anthropomorphism- I see it as quite the opposite. I would expect something non-human that can accurately predict outcomes of a huge diversity of real world situations based purely on some type of model that spontaneously develops by optimization- to do so in an extremely alien and non-human way that is likely incomprehensible in structure to us. Having an extremely alien but accurate way of predicatively modeling events that is not subject to human limitations and biases would be, I think, incredibly useful for escaping limitations of human thought processes, even if replacing them with other different ones.

I am using modeling/predicting accurately in a way synonymous with understanding, but I could see people objecting to the word 'understanding' as itself anthropomorphic... although I disagree. It would require a philosophical debate on what it means to understand something I suppose, but my overall point still stands without using that word at all.


> specific I would say that increasingly accurately predicting the next symbols in a massive set of diverse sequences, which explain a huge diversity of real world events described in sequential order, requires increasingly accurate models of the underlying processes of said events

But it doesn’t - it’s a statistical model using training data, not a physical or physics model, which you seem to be equating it to (correct me if I am misunderstanding)

And in response to the other portion you present, an LLM fundamentally can’t be alien because it’s trained on human produced output. In a way, it’s a model of the worst parts of human output - garbage in, garbage out, as they say - since it’s trained on the corpus of the internet.


> But it doesn’t - it’s a statistical model using training data, not a physical or physics model, which you seem to be equating it to (correct me if I am misunderstanding)

All learning and understanding is fundamentally statistical in nature- probability theory is the mathematical formalization of the process of learning from real world information, e.g. reasoning under uncertainty[1].

The model is assembling 'organically' under a stochastic optimization process- and as a result is is largely inscrutable, and not rationally designed- not entirely unlike how biological systems evolve (although also still quite different). The fact that it is statistical and using training data is just a surface level fact about how a computer was setup to allow the model to generate, and tells you absolutely nothing about how it is internally structured to represent the patterns in the data. When your training data contains for example descriptions of physical situations and the resulting outcomes, the model will need to at least develop some type of simple heuristic ability to approximate the physical processes generating those outcomes- and at the limit of increasing accuracy, that is an increasingly sophisticated and accurate representation of the real process. It does not matter if the input is text or images any more than it matters to a human that understands physics if they are speaking or writing about it- the internal model that lets it accurately predict the underlying processes leading to specific text describing those events is what I am talking about here, and deep learning easily abstracts away the mundane I/O.

An LLM is an alien intelligence because of the type of structures it generates for modeling reality are radically different from those in human brains, and the way it solves problems and reasons is radically different- as is quite apparent when you pose it a series of novel problems and see what kind of solutions it comes up with. The fact that it is trained on data provided by humans doesn't change the fact that it is not itself anything like a human brain. As such it will always have different strengths, weaknesses, and abilities from humans- and the ability to interact with a non-human intelligence to get a radically non-human perspective for creative problem solving is IMO, the biggest opportunity they present. This is something they are already very good at, as opposed to being used as an 'oracle' for answering questions about known facts, which is what people want to use it for, but they are quite poor at.

[1] Probability Theory: The Logic of Science by E.T. Jaynes


> I would say that increasingly accurately predicting the next symbols in a massive set of diverse sequences, which explain a huge diversity of real world events described in sequential order, requires increasingly accurate models of the underlying processes of said events.

I disagree. Understanding things is more than just being able to predict their behaviour.

Flat Earthers can still come up with a pretty good idea of where (direction relative to the vantage point) and when the Sun will appear to rise tomorrow.


> Flat Earthers can still come up with a pretty good idea of where (direction relative to the vantage point) and when the Sun will appear to rise tomorrow.

Understanding is having a mechanistic model of reality- but all models are wrong to varying degrees. The Flat Earther model is actually quite a good one for someone human sized on a massive sphere- it is locally accurate enough that it works for most practical purposes. I doubt most humans could come up with something so accurate on their own from direct observation- even the fact that the local area is approximately flat in the abstract is far from obvious with hills, etc.

A more common belief nowadays is that the earth is approximately a sphere, but very few people are aware of the fact that it actually bulges at the equator, and is more flat at the poles. Does that mean all people that think the earth is a sphere are therefore fundamentally lacking the mental capacity to understand concepts or to accurately model reality? Moreover, people are mostly accepting this spherical model on faith, they are not reasoning out their own understanding from data or anything like that.

I think it's very important to distinguish between something that fundamentally can only repeat it's input patterns in a stochastic way, like a Hidden Markov Model, and something that can make even quite oversimplified and incorrect models, that it can still sometimes use to extrapolate correctly to situations not exactly like those it was trained on. Many people seem to think LLMs are the former, but they are provably not- we can fabricate new scenarios, like simple physics experiments not in the training data set that require tracking the location and movement of objects, and they can do this correctly- something that can only be done with simple physical models- however ones still far simpler than what even a flat earther has. I think being able to tell that a new joke is funny, what it means, and why it is funny is also an example of, e.g. having a general model that understands what types of things humans think are funny at an abstract level.


> This can largely be mitigated with more careful and specific prompting of what exactly you are asking it to simulate. If you don't specify, there will be a high frequency of accurately simulating uninformed idiots, as occur in much of the text on the internet.

I don't think people are underestimating LLMs, they're just acknowledging that by the time you've provided sufficient specification, you're 80% of the way to solving the problem/writing the code already. And at that point, it's easier to just finish the job yourself rather than have to go through the LLM's output, validate the content, revise further if necessary, etc


I'm actually in the camp that they are basically not very useful yet, and don't actually use them myself for real tasks. However, I am certain from direct experimentation that they exhibit real understanding, creativity, and modeling of underlying systems that extrapolates to correctly modeling outcomes in totally novel situations, and don't just parrot snippets of text from the training set.

What people want and expect them to be is an Oracle that correctly answers their vaguely specified questions, which is simply not what they are, or are good at. What they can do is fascinating and revolutionary, but possibly not very useful yet, at least until we think of a way to use it, or make it even more intelligent. In fact, thinking is what they are good at, and simply repeating facts from a training set is something they cannot do reliably- because the model must inherently be too compressed to store a lot of facts correctly.


> systematically doubt and underestimate their ability to exhibit real creativity and understanding based problem solving.

I fundamentally disagree that anything in the rest of your post actually demonstrates that they have any such capacity at all.

It seems to me that this is because you consider the terms "creativity" and "problem solving" to mean something different. With my understanding of those terms, it's fundamentally impossible for an LLM to exhibit those qualities, because they depend on having volition - an innate spontaneous generation of ideas for things to do, and an innate desire to do them. An LLM only ever produces output in response to a prompt - not because it wants to produce output. It doesn't want anything.


> it's fundamentally impossible for an LLM to exhibit those qualities, because they depend on having volition

I don't see the connection between volition and those other qualities, saying one depends on the other seems arbitrary to me- and would result in semantically and categorically defining away the possibility of non-human intelligence altogether, even from things that are in all accounts capable of much more than humans in almost every aspect. People don't even universally agree that humans have volition- it is an age old philosophical debate.

Perhaps you can tell me your thoughts or definition of what those things (as well as volition itself) mean? I will share mine here.

Creativity is the ability to come up with something totally new that is relevant to a specific task or problem- e.g. a new solution to a problem, a new artwork that expresses an emotion, etc. In both Humans and LLMs these creative ideas don't seem to be totally 'de novo' but seem to come mostly from drawing high level analogies between similar but different things, and copying ideas and aspects from one to another. Fundamentally, it does require a task or goal, but that itself doesn't have to be internal. If an LLM is prompted, or if I am given a task by my employer, we are still both exhibiting creativity when we solve it in a new way.

Problem solving is I think similar but more practical- when prompted with a problem that isn't exactly in the training set, can it come up with a workable solution or correct answer? Presumably by extrapolating, or using some type of generalized model that can extrapolate or interpolate to situations not exactly in the training data. Sure there must be a problem here that is trying to be solved, but it seems irrelevant if that is due to some internal will or goals, or an external prompt.

In the sense that volition is selecting between different courses of action towards a goal- LLMs do select between different possible outputs based on probabilities about how suitable they are in context of the given goal of response to a prompt.


Good perspective. Maybe it's because people are primed by sci-fi to treat this as a god-like oracle model. Note that even in the real-world simulations can give wrong results as we don't have perfect information, so we'll probably never have such an oracle model.

But if you stick with the oracle framework, then it'd be better to model it as some sort of "fuzzy oracle" machine, right? I'm vaguely reminded of probabilistic turing machines here, in that you have some intrinsic amount of error (both due to the stochastic sampling as well as imperfect information). But the fact that prompting and RLHF works so well implies that by crawling around in this latent space, we can bound the errors to the point that it's "almost" an oracle, or a "simulation" of the true oracle that people want it to be.

And since lazy prompting techniques still work, that seems to imply that there's juice left to squeeze in terms of "alignment" (not in the safety sense, but in conditioning the distribution of outputs to increase the fidelity of the oracle simulation).

Also the second consequence is that probably the reason it needs so much data is because it just doesn't model _one_ thing, it tries to be a joint model of _everything_. A human learns with far less data, but the result is only a single personality. For a human to "act" as someone, they need to do training, character studies, and such to try to "learn" about the person, and even then good acting is a rare skill.

If you genuinely want an oracle machine, there's no way to avoid vacuuming up all the data that exists because without it you can't make a high fidelity simulation someone else. But on the flipside, if you're willing to be smarter about what facets you exclude then I'd guess there's probably a way to prune models in a way smarter than just quantizing them. I guess this is close to mixture-of-experts.


I get that people really want an oracle, and are going to judge any AI system by how good it does at that - yes from sci-fi influenced expectations that expected AI to be rationally designed, and not inscrutable and alien like LLMs... but I think that will almost always be trying to fit a round peg into a square hole, and not using whatever we come up with very effectively. Surely, as LLMs have gotten better they have become more useful in that way so it is likely to continue getting better at pretending to be an oracle, even if never being very good at that compared to other things it can do.

Arguably, a (the?) key measure of intelligence is being able to accurately understand and model new phenomenon from a small amount of data, e.g. in a Bayesian sense. But in this case we are attempting to essentially evolve all of the structures of an intelligent system de novo from a stochastic optimization process- so is probably better compared to the entire history of evolution than to an individual human learning during their lifetime, although both analogies have big problems.

Overall, I think the training process will ultimately only be required to build a generally intelligent structure, and good inference from a small set of data or a totally new category of problem/phenomenon will happen entirely at the inference stage.


Just want to note that this simple “mimicry” of mistakes seen in the training text can be mitigated to some degree by reinforcement learning (e.g. RLHF), such that the LLM is tuned toward giving responses that are “good” (helpful, honest, harmless, etc…) according to some reward function.


> At it's core an LLM is a sort of "situation specific simulation engine." You setup a scenario, and it then plays it out with it's own internal model of the situation, trained on predicting text in a huge variety of situations. This includes accurate real world models of, e.g. physical systems and processes, that are not going to be accessed or used by all prompts, that don't correctly instruct it to do so.

This idea of LLMs doing simulations of the physical world I've never heard before. In fact a transformer model cannot do this. Do you have a source?


I have been using various LLMs to do some meal planning and recipe creation. I asked for summaries of the recipes and they looked good.

I then asked it to link a YouTube video for each recipe and it used the same video 10 times for all of the recipes. No amount of prompting was able to fix it unless I request one video at a time. It would just acknowledge the mistake, apologize and then repeat the same mistake again.

I told it let’s try something different and generate a shopping list of ingredients to cover all of the recipes, it recommended purchasing amounts that didn’t make sense and even added some random items that did not occur in any of the recipes

When I was making the dishes, I asked for the detailed recipes and it completely changed them, adding ingredients that were not on the shopping list. When I pointed it out it again, it acknowledged the mistake, apologized, and then “corrected it” by completely changing it again.

I would not conclude that I am a lazy or bad prompter, and I would not conclude that the LLMs exhibited any kind of remarkable reasoning ability. I even interrogated the AIs about why they were making the mistakes and they told me because “it just predicts the next word”.

Another example is, I asked the bots for tips on how to feel my pecs more on incline cable flies, it told me to start with the cables above shoulder height, which is not an incline fly, it is a decline fly. When I questioned it, it told me to start just below shoulder height, which again is not an incline fly.

My experience is that you have to write a draft of the note you were trying to create or leave so many details in the prompts that you are basically doing most of the work yourself. It’s great for things like give me a recipe that contains the following ingredients or clean up the following note to sound more professional. Anything more than that it tends to fail horribly for me. I have even had long conversations with the AIs asking them for tips on how to generate better prompts and it’s recommending things I’m already doing.

When people remark about the incredible reasoning ability, I wonder if they are just testing it on things that were already in the training data or they are failing to recognize how garbage the output can be. However, perhaps we can agree that the reasoning ability is incredible in the sense that it can do a lot of reasoning very quickly, but it completely lacks any kind of common sense and often does the wrong kind of reasoning.

For example, the prompt about tips to feel my pecs more on an incline cable fly could have just entailed “copy and pasting” a pre-written article from the training data; but instead in its own words, it “over analyzed bench angles and cable heights instead of addressing what you meant”. One of the bots did “copy paste” a generic article that included tips for decline flat and incline. None correctly gave tips for just incline on the first try, and some took several rounds of iteration basically spoon feeding the model the answer before it understood.


You're expecting it to be an 'oracle' that you prompt it with any question you can think of, and it answers correctly. I think your experiences will make more sense in the context of thinking of it as a heuristic model based situation simulation engine, as I described above.

For example, why would it have URLs to youtube videos of recipes? There is not enough storage in the model for that. The best it can realistically do is provide a properly formatted youtube URL. It would be nice if it could instead explain that it has no way to know that, but that answer isn't appropriate within the context of the training data and prompt you are giving it.

The other things you asked also require information it has no room to store, and would be impossibly difficult to essentially predict via model from underlying principles. That is something they can do in general- even much better than humans already in many cases- but is still a very error prone process akin to predicting the future.

For example, I am a competitive strength athlete, and I have a doctorate level training in human physiology and biomechanics. I could not reason out a method for you to feel your pecs better without seeing what you are already doing and coaching you in person, and experimenting with different ideas and techniques myself- also having access to my own actual human body to try movements and psychological cues on.

You are asking it to answer things that are nearly impossible to compute from first principles without unimaginable amounts of intelligence and compute power, and are unlikely to have been directly encoded in the model itself.

Now turning an already written set of recipes into a shopping list is something I would expect it to be able to do easily and correctly if you were using a modern model with a sufficiently sized context window, and prompting it correctly. I just did a quick text where I gave GPT 4o only the instruction steps (not ingredients list) for an oxtail soup recipe, and it accurately recreated the entire shopping list, organized realistically according to likely sections in the grocery store. What model were you using?


> an oxtail soup recipe

Sounds like the model just copy pasted one from the internet, hard to get that wrong. GP could have had a bespoke recipe and list of ingredients. This particular example of yours just reconfirmed what was being said: it's only able to copy-paste existing content, and it's lost otherwise.

In my case I have huge trouble making it create useful TypeScript code for example, simply because apparently there isn't sufficient advanced TS code that is described properly.

For completeness sake, my last prompt was to create a function that could infer one parameter type but not the other. After several prompts and loops, I learned that this is just not possible in TypeScript yet.


No, that example is not something that I would find very useful or a good example of its abilities- just one thing I generally expected it to be capable of doing. One can quickly confirm that it is doing the work and not copying and pasting the list by altering the recipe to include steps and ingredients not typical for such a recipe. I made a few such alterations just now, and reran it, and it adjusted correctly from a clean prompt.

I've found it able to come up with creative new ideas for solving scientific research problems, by finding similarities between concepts that I would not have thought of. I've also found it useful for suggesting local activities while I'm traveling based on my rather unusual interests that you wouldn't find recommended for travelers anywhere else. I've also found it can solve totally novel classical physics problems with correct qualitative answers that involve keeping track of the locations and interactions of a lot of objects.. I'm not sure how useful that is, but it proves real understanding and modeling - something people repeatedly say LLMs will never be capable of.

I have found that it can write okay code to solve totally novel problems, but not without a ton of iteration- which it can do, but is slower than me just doing it myself, and doesn't code in my style. I have not yet decided to use any code it writes, although it is interesting to test its abilities by presenting it with weird coding problems.

Overall, I would say it's actually not really very useful, but is actually exhibiting (very much alien and non-human like) real intelligence and understanding. It's just not an oracle- which is what people want and would find useful. I think we will find them more useful with having our own better understanding of what they actually are and can do, rather than what we wish they were.


> At it's core an LLM is a sort of "situation specific simulation engine." You setup a scenario, and it then plays it out with it's own internal model of the situation, trained on predicting text in a huge variety of situations. This includes accurate real world models of, e.g. physical systems and processes, that are not going to be accessed or used by all prompts, that don't correctly instruct it to do so.

You have simply invented total nonsense about what an LLM is "at it's core". Confidently stating this does not make it true.


Except I didn't just state it, I also explained the rationale behind it, and elaborated further on that substantially in subsequent replies to other comments. What is your specific objection?


Garbage in garbage out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: