> Already we live with incredible digital intelligence, and after some initial s...

helloplanets · 2025-06-11T09:02:58 1749632578

What's your take on Anthropic's 'Tracing the thoughts of a large language model'? [0]

> To write the second line, the model had to satisfy two constraints at the same time: the need to rhyme (with "grab it"), and the need to make sense (why did he grab the carrot?). Our guess was that Claude was writing word-by-word without much forethought until the end of the line, where it would make sure to pick a word that rhymes. We therefore expected to see a circuit with parallel paths, one for ensuring the final word made sense, and one for ensuring it rhymes.

> Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.

This is an older model (Claude 3.5 Haiku) with no test time compute.

[0]: https://www.anthropic.com/news/tracing-thoughts-language-mod...

Sammi · 2025-06-11T09:55:39 1749635739

What is called "planning" or "thinking" here doesn't seem conceptually much different to me than going from naive breath first search based Dijkstra shortest path search, to adding a heuristics that makes it search in a particular direction first and calling it A*. In both cases you're adding another layer to an existing algorithm in order to make it more effective. Doesn't make either AGI.

I'm really no expert in neural nets or LLMs, so my thinking here is not an expert opinion, but as a CS major reading that blog from Anthropic, I just cannot see how they provided any evidence for "thinking". To me it's pretty aggressive marketing to call this "thinking".

exe34 · 2025-06-11T12:19:45 1749644385

> In both cases you're adding another layer to an existing algorithm in order to make it more effective. Doesn't make either AGI.

Yet. The human mind is a big bag of tricks. If the creators of AI can enumerate a large enough list of capabilities and implement those, then the product can be as good as 90% of humans, but at a fraction of the cost and a billion times the speed - then it doesn't matter if it's AGI or not. It will have economic consequences.

AnimalMuppet · 2025-06-11T13:21:55 1749648115

And make them work together. It's not just having a big bag of tricks; it's also knowing which trick to pull out when. (And that may just be pulling out a trick, trying it, and knowing when the results aren't good enough, and so trying a different one.)

The observant will note that the word "knowing" kept appearing in the previous paragraph. Can that knowing also be reduced to LLM-like tricks? Or is it an additional step?

exe34 · 2025-06-11T14:55:47 1749653747

It's sufficient to appear to know. My washing machine "knows" when my clothes are dry.

helloplanets · 2025-06-11T11:35:10 1749641710

They definitely do strain the neurology and thinking metaphors in that article. But the Dijkstra's algorithm and A* comparisons are the flipside of that same coin. They aren't trying to make it more effective. And definitely not trying to argue for anything AGI related.

Either way: They're tampering with the inference process, by turning circuits in the LLM on and off, in an attempt to prove that those circuits are related with a specific function. [0]

They noticed that circuits related to a token that is only relevant ~8 tokens forward were already activated on the newline token. Instead of only looking at the sequence of tokens that has been generated so far (aka backwards), and generating the next token based off of that information, the model is activating circuits related to tokens that are not relevant to the next token only, but to specific tokens a handful of tokens after.

So, information related to more than just the next upcoming token (including a reference to just one specific token) is being cached during a newline token. Wouldn't call that thinking, but I don't think calling it planning is misguided. Caching this sort of information in the hidden state would be an emergent feature, rather than a feature that was knowingly aimed at by following a specific training method, unlike with models that do test time compute. (DeepSeek-R1 paper being an example, with a very direct aim at turbocharging test time compute, aka 'reasoning'. [1])

The way they went at defining the function of a circuit, was by using their circuit tracing method, which is open source so you can try it out for yourself. [2] Here's the method in short: [3]

> Our feature visualizations show snippets of samples from public datasets that most strongly activate the feature, as well as examples that activate the feature to varying degrees interpolating between the maximum activation and zero.

> Highlights indicate the strength of the feature’s activation at a given token position. We also show the output tokens that the feature most strongly promotes / inhibits via its direct connections through the unembedding layer (note that this information is typically more meaningful for features in later model layers).

[0]: https://transformer-circuits.pub/2025/attribution-graphs/bio... [1]: https://arxiv.org/pdf/2501.12948 [2]: https://github.com/safety-research/circuit-tracer [3]: https://transformer-circuits.pub/2025/attribution-graphs/met...

yencabulator · 2025-06-12T02:27:23 1749695243

Generalize the concept from next token prediction to coming tokens prediction and the rest still applies. LLMs are still incredibly poor at symbolic thought and following multi-step algorithms, and I as a non-ML person don't really see what in the LLM mechanism would provide such power. Or maybe we're still just another 1000x scale off and symbolic thought will emerge at some point.

Me personally, I expect to see LLMs to be a mere part of whatever will be invented later.

iNic · 2025-06-11T07:55:11 1749628511

The mere token prediction comment is wrong, but I don't think any of the other comments really explained why. Next token prediction is not what the AI does, but its goal. It's like saying soccer is a boring sport having only ever seen the final scores. The important thing about LLMs is that they can internally represent many different complex ideas efficiently and coherently! This makes them an incredible starting point for further training. Nowadays no LLM you interact with will be a pure next token predictor anymore, they will have all gone through various stages of RL, so that they actually do what we want them to do. I think I really feel the magic looking at the "circuit" work by Anthropic. It really shows that these models have some internal processing / thinking that is complex and clever.

quonn · 2025-06-11T11:31:19 1749641479

> that they can internally represent many different complex ideas efficiently and coherently

The Transformer circuits[0] suggest that this representation is not coherent at all.

[0] https://transformer-circuits.pub

iNic · 2025-06-11T12:00:07 1749643207

I guess that depends on what you think is coherent. A key finding is that the larger the network the more coherent the representation becomes. One example is larger networks merge the same concept across different languages into a single concept (as humans do). The addition circuits are also fairly easy to interpret.

quonn · 2025-06-11T13:11:27 1749647487

> merge the same concept

It's doing compression which does not mean it's coherent.

> The addition circuits are also fairly easy to interpret.

The addition circuits make no sense whatsoever. It's doing great at guessing that's all.

iNic · 2025-06-16T08:02:57 1750060977

I am curious, what would you count as coherent? I think it is absolutely insane that we can open and understand what are essentially alien intelligences at all!

trashtester · 2025-06-11T06:56:43 1749625003

The "next token prediction" is a distraction. That's not where the interesting part of an AI model happens.

If you think of the tokenization near the end as a serializer, something like turning an object model into json, you get a better understanding. The interesting part of a an OOP program is not in the json, but what happens in memory before the json is created.

Likewise, the interesting parts of a neural net model, whether it's LLM's, AlphaProteo or some diffusion based video model, happen in the steps that operate in their latent space, which is in many ways similar to our subconscious thinking.

In those layers, the AI models detect deeper and deeper patterns of reality. Much deeper than the surface pattern of the text, images, video etc used to train them. Also, many of these patterns generalize when different modalities are combined.

From this latent space, you can "serialize" outputs in several different ways. Text is one, image/video another. For now, the latent spaces are not general enough to do all equally well, instead models are created that specialize on one modality.

I think the step to AGI does not require throwing a lot more compute into the models, but rather to have them straddle multiple modalities better, in particular, these:

- Physical world modelling at the level of Veo3 (possibly with some lessons from self driving or robotics model for elements like object permananence and perception) - Symbolic processing of the best LLM's. - Ability to be goal oriented and iterate towards a goal, similar to the Alpha* family of systems - Optionally: Optimized for the use of a few specific tools, including a humanoid robot.

Once all of these are integrated into the same latent space, I think we basically have what it takes to replace most human thought.

sgt101 · 2025-06-11T07:23:42 1749626622

>which is in many ways similar to our subconscious thinking

this is just made up.

- we don't have any useful insight on human subconscious thinking. - we don't have any useful insight on the structures that support human subconscious thinking. - the mechanisms that support human cognition that we do know about are radically different from the mechanisms that current models use. For example we know that biological neurons & synapses are structurally diverse, we know that suppression and control signals are used to change the behaviour of the networks , we know that chemical control layers (hormones) transform the state of the system.

We also know that biological neural systems continuously learn and adapt, for example in the face of injury. Large models just don't do these things.

Also this thing about deeper and deeper realities? C'mon, it's surface level association all the way down!

ixtli · 2025-06-11T08:37:59 1749631079

Yea whenever we get into this sort of “what’s happening in the network is like what’s going on in your brain” discussion people never have concrete evidence of what they’re talking about.

scarmig · 2025-06-11T08:58:24 1749632304

The diversity is itself indicative, though, that intelligence isn't bound to the particularities of the human nervous system. Across different animal species, nervous systems show a radical diversity. Different architectures; different or reversed neurotransmitters; entirely different neural cell biologies. It's quite possible that "neurons" evolved twice, independently. There's nothing magic about the human brain.

Most of your critique is surface level: you can add all kinds of different structural diversity to an ML model and still find learning. Transformers themselves are formally equivalent to "fast weights" (suppression and control signals). Continuous learning is an entire field of study in ML. Or, for injury, you can randomly mask out half the weights of a model, still get reasonable performance, and retrain the unmasked weights to recover much of your loss.

Obviously there are still gaps in ML architectures compared to biological brains, but there's no particular reason to believe they're fundamental to existence in silico, as opposed to myelinated bags of neurotransmitters.

sgt101 · 2025-06-11T14:59:50 1749653990

>The diversity is itself indicative, though, that intelligence isn't bound to the particularities of the human nervous system. Across different animal species, nervous systems show a radical diversity. Different architectures; different or reversed neurotransmitters; entirely different neural cell biologies. It's quite possible that "neurons" evolved twice, independently. There's nothing magic about the human brain.

I agree - for example Octopus's are clearly somewhat intelligent, maybe very intelligent, and they have a very different brain architecture. Bees have a form of collective intelligence that seems to be emergent from many brains working together. Human cognition could arguably be identified as having a socially emergent component as well.

>Most of your critique is surface level: you can add all kinds of different structural diversity to an ML model and still find learning. Transformers themselves are formally equivalent to "fast weights" (suppression and control signals). Continuous learning is an entire field of study in ML. Or, for injury, you can randomly mask out half the weights of a model, still get reasonable performance, and retrain the unmasked weights to recover much of your loss.

I think we can only reasonably talk about the technology as it exists. I agree that there is no justifiable reason (that I know of) to claim that biology is unique as a substrate for intelligence or agency or consciousness or cognition or minds in general. But the history of AI is littered with stories of communities believing that a few minor problems just needed to be tidied up before everything works.

spot · 2025-06-16T15:02:01 1750086121

not just made up: https://www.quantamagazine.org/self-taught-ai-shows-similari...

ben_w · 2025-06-11T11:57:35 1749643055

The bullet list is a good point, but:

> We also know that biological neural systems continuously learn and adapt, for example in the face of injury. Large models just don't do these things.

This is a deliberate choice on the part of the model makers, because a fixed checkpoint is useful for a product. They could just keep the training mechanism going, but that's like writing code without version control.

> Also this thing about deeper and deeper realities? C'mon, it's surface level association all the way down!

To the extent I agree with this, I think it conflicts with your own point about us not knowing how human minds work. Do I, myself, have deeper truths? Or am myself I making surface level association after surface level association, but have enough levels to make it seem deep? I do not know how many grains make the heap.

sgt101 · 2025-06-11T15:19:27 1749655167

>This is a deliberate choice on the part of the model makers, because a fixed checkpoint is useful for a product. They could just keep the training mechanism going, but that's like writing code without version control.

Training more and learning online are really different processes. In the case of large models I can't see how it would be practical to have the model learn as it was used because it's shared by everyone.

>To the extent I agree with this, I think it conflicts with your own point about us not knowing how human minds work. Do I, myself, have deeper truths? Or am myself I making surface level association after surface level association, but have enough levels to make it seem deep? I do not know how many grains make the heap.

I can't speak for your cognition or subjective experience, but I do have both fundamental grounding experiences (like the time I hit my hand with an axe, the taste of good beer, sun on my face) and I have used trial and error to develop causative models of how these come to be. I have become good at anticipating which trials are too costly and have found ways to fill in the gaps where experience could hurt me further. Large models have none of these features or capabilities.

Of course I may be deceived by my cognition into believing that deeper processes exist that are illusory because that serves as a short cut to "fitter" behaviour and evolution has exploited this. But it seems unlikely to me.

ben_w · 2025-06-11T18:05:05 1749665105

> In the case of large models I can't see how it would be practical to have the model learn as it was used because it's shared by everyone.

Given it can learn from unordered text of the entire the internet, it can learn from chats.

> I can't speak for your cognition or subjective experience, but I do have both fundamental grounding experiences (like the time I hit my hand with an axe, the taste of good beer, sun on my face) and I have used trial and error to develop causative models of how these come to be. I have become good at anticipating which trials are too costly and have found ways to fill in the gaps where experience could hurt me further. Large models have none of these features or capabilities.

> Of course I may be deceived by my cognition into believing that deeper processes exist that are illusory because that serves as a short cut to "fitter" behaviour and evolution has exploited this. But it seems unlikely to me.

Humans are very good at creating narratives about our minds, but in the cases where this can be tested, it is often found that our conscious experiences are preceded by other brain states in a predictable fashion, and that we confabulate explanations post-hoc.

So while I do not doubt that this is how it feels to be you, the very same lack of understanding of causal mechanisms within the human brain that makes it an error to confidently say that LLMs copy this behaviour, also mean we cannot truly be confident that the reasons we think we have for how we feel/think/learn/experience/remember are, in fact, the true reasons for how we feel/think/learn/experience/remember.

phorkyas82 · 2025-06-11T12:15:33 1749644133

As far as I understood any AI model is just a linear combination of its training data. Even if that were such a large corpus as the entire web... it's still just like a sophisticated compression of other's people's expressions.

It has not made its own experiences, not interacted with the outer world. Dunno, I won't to rule out something operating solely on language artifacts cannot develop intelligence or consciousness, whatever that is,.. but so far there are also enough humans we could care about and invest into.

tlb · 2025-06-11T12:19:52 1749644392

LLMs are not a linear combination of training data.

Some LLMs have interacted with the outside world, such as through reinforcement learning while trying to complete tasks in simulated physics environments.

olmo23 · 2025-06-11T14:21:47 1749651707

Just because humans can describe it, doesn't mean they can understand (predict) it.

And the web contains a lot more than people's expressions: think of all the scientific papers with tables and tables of interesting measurements.

andsoitis · 2025-06-11T10:39:54 1749638394

> the AI models detect deeper and deeper patterns of reality. Much deeper than the surface pattern of the text

What are you talking about?

klipt · 2025-06-11T07:01:13 1749625273

If you wish to make an apple pie from scratch

You must first invent the universe

If you wish to predict the next token really well

You must first model the universe

Aeolun · 2025-06-11T10:02:50 1749636170

> wondering when it can generate a beautifully-written novel

Not quite yet, but I’m working on it. It’s ~~hard~~ impossible to get original ideas out of an LLM, so it’ll probably always be a human assisted effort.

agumonkey · 2025-06-11T12:08:43 1749643723

The TB of everything with transformers makes a difference, maybe i'm just too uneducated, but the amount of semantic context that can be taken into account when generating the next token is really disrupting.

marsten · 2025-06-11T07:31:31 1749627091

> Over the years, the magic was never lost on me. However, I can never see LLMs as more than a "token prediction machine".

The "mere token prediction machine" criticism, like Pearl's "deep learning amounts to just curve fitting", is true but it also misses the point. AI in the end turns a mirror on humanity and will force us to accept that intelligence and consciousness can emerge from some pretty simple building blocks. That in some deep sense, all we are is curve fitting.

It reminds me of the lines from T.S. Eliot, “...And the end of all our exploring, Will be to arrive where we started, And know the place for the first time."