> It DOES fail more when the numbers are longer (because it results with more te...

kovek · 2025-10-24T20:48:29 1761338909

Well, if the model can reliably keep in context CPU cache plus CPU registers plus CPU instructions and is able to do operations based on those, then we pretty much solved computation using LLMs, right? It could use RAG to operate on RAM and SSD.

Here we can see the amount of data a high end traditional non-SOC CPU holds:

> For a recent high-end non-SoC desktop CPU: > Cache: ~40-100 MB total (L1 + L2 + shared L3) > Register files: tens to few hundreds of KB total across cores (e.g., ~200-300 KB or so) > Combined: So you're looking at ~40-100 MB + ~0.2 MB → roughly ~40-100 MB of total on-chip caches + registers.

I'm sure we can reduce these caches to fit in the context windows of today's LLMs (~500,000 tokens).

Then, with temperature 0 we get more "discrete" operations. Now, we still have the rare problem of hallucinations, but it should be small with temperature 0.

lossolo · 2025-10-24T21:10:10 1761340210

It doesn't work like mapping CPU caches/registers into an LLM context. Transformers have no mutable registers, they attend over past tokens and can't update prior state. RAG isn't RAM. Even with huge context, you still can't step CPU style instructions without an external, read/write memory/tooling.

And temperature 0 makes outputs deterministic, not magically correct.

photonthug · 2025-10-24T21:23:23 1761341003

> And temperature 0 makes outputs deterministic, not magically correct.

For reasons I don't claim to really understand, I don't think it even makes them deterministic. Floating point something something? I'm not sure temperature even has a static technical definition or implementation everywhere at this point. I've been ignoring temperature and using nucleus sampling anywhere that's exposed and it seems to work better.

Random but typical example.. pydantic-ai has a caveat that doesn't reference any particular model: "Note that even with temperature of 0.0, the results will not be fully deterministic". And of course this is just the very bottom layer of model-config and in a system of diverse agents using different frameworks and models, it's even worse.

astrange · 2025-10-25T06:03:41 1761372221

It's partly because floating point math is not associative and GPU inference doesn't guarantee all the steps will be done in the same order.

razodactyl · 2025-10-26T13:33:45 1761485625

Well mostly but they can generate more state that can push old state out of context.

If an LLM were sufficiently trained to be able to roll-forward and correctly set the current state of some registers written into the conversation..? I wouldn't trust it though, leaves too much to chance.

I too make mistakes trying to keep track of things, I end up using tools too.

kovek · 2025-10-24T21:29:44 1761341384

Well, the LLM may re-infer the whole state fully on every instruction. Temperature 0 is deterministic and that's what we are looking for. If the model is trained properly on how the CPU state + instructions should be handled, then it should be able to produce the next state.

lossolo · 2025-10-24T22:42:10 1761345730

With temp = 0 if the model is off by one bit at step k, all subsequent steps are deterministically wrong.

Your previous example shows the best case, which is a model can sometimes follow a textual recipe for long multiplication on short inputs. That's not the same as learning a length generalizing bit exact algorithm.

Basically what you shown is the model can describe the algorithm. It doesn't show it can execute it at scale. Without writable state and bit exact ops, errors grow with length and "focus more" only slows that failure, it doesn’t eliminate it.

kovek · 2025-10-25T08:01:16 1761379276

> It doesn't show it can execute it at scale. Without writable state and bit exact ops,

Well, modern LLM coding agent products (eg. Claude Code) are able to store state in files in the current repository. So, you could have the model keep the "CPU State", and the files in the repository be the "RAM".

Also, could this https://arxiv.org/html/2402.17764v1 possibly reduce errors when doing inference? There is no floating point operations

razodactyl · 2025-10-26T13:38:35 1761485915

It seems to be the conclusion that we come to though, we ourselves use tools.

The focus here is the LLM being able to do it unaided.

The space of all combinations of steps is so large for many problems that require precision and usually one incorrect step breaks everything. "I forgot to carry the 1".

Even then, while brilliant, Claude does screw up sometimes - we're not there yet but it doesn't prevent it from being adequately useful.