They definitely do strain the neurology and thinking metaphors in that article. ...

They definitely do strain the neurology and thinking metaphors in that article. But the Dijkstra's algorithm and A* comparisons are the flipside of that same coin. They aren't trying to make it more effective. And definitely not trying to argue for anything AGI related.

Either way: They're tampering with the inference process, by turning circuits in the LLM on and off, in an attempt to prove that those circuits are related with a specific function. [0]

They noticed that circuits related to a token that is only relevant ~8 tokens forward were already activated on the newline token. Instead of only looking at the sequence of tokens that has been generated so far (aka backwards), and generating the next token based off of that information, the model is activating circuits related to tokens that are not relevant to the next token only, but to specific tokens a handful of tokens after.

So, information related to more than just the next upcoming token (including a reference to just one specific token) is being cached during a newline token. Wouldn't call that thinking, but I don't think calling it planning is misguided. Caching this sort of information in the hidden state would be an emergent feature, rather than a feature that was knowingly aimed at by following a specific training method, unlike with models that do test time compute. (DeepSeek-R1 paper being an example, with a very direct aim at turbocharging test time compute, aka 'reasoning'. [1])

The way they went at defining the function of a circuit, was by using their circuit tracing method, which is open source so you can try it out for yourself. [2] Here's the method in short: [3]

> Our feature visualizations show snippets of samples from public datasets that most strongly activate the feature, as well as examples that activate the feature to varying degrees interpolating between the maximum activation and zero.

> Highlights indicate the strength of the feature’s activation at a given token position. We also show the output tokens that the feature most strongly promotes / inhibits via its direct connections through the unembedding layer (note that this information is typically more meaningful for features in later model layers).

[0]: https://transformer-circuits.pub/2025/attribution-graphs/bio... [1]: https://arxiv.org/pdf/2501.12948 [2]: https://github.com/safety-research/circuit-tracer [3]: https://transformer-circuits.pub/2025/attribution-graphs/met...