More

photonthug · 2025-10-25T15:55:15 1761407715

You might prefer this sort of thing: A Definition of AGI https://arxiv.org/abs/2510.18212

logicprog · 2025-10-25T21:07:21 1761426441

Ooh, that looks very cool. The lack of a concrete definition of AGI and a scientifically (in the correct domains) backed operationalization of such a definition that can allow direct comparisons between humans and current AIs, where it isn't impossible for humans and/or easy to saturate by AIs, is much needed.

photonthug · 2025-10-25T15:35:00 1761406500

The capital was better fortified and the French wanted food and loot. So softer target sounds nice, especially if you think this crushes morale immediately and don't believe the opposition will go scorched earth (but they did).

> If I would take S.P., I would hold Russia by the head. If I take Kiev, I will hold Russia by legs. If I take Moscow, I will reach right into its heart!"

https://history.stackexchange.com/questions/27588/why-did-na...

photonthug · 2025-10-24T20:15:47 1761336947

> It DOES fail more when the numbers are longer (because it results with more text in the context),

I tried to raise this question yesterday. https://news.ycombinator.com/item?id=45683113#45687769

Declaring victory on "reasoning" based on cherry-picking a correct result about arithmetic is, of course, very narrow and absurdly optimistic. Even if it correctly works for all NxM calculations. Moving on from arithmetic to any kind of problem that fundamentally reduces to model-checking behind the scenes.. we would be talking about exploring a state-space with potentially many thousands of state-transitions for simple stuff. If each one even has a small chance of crapping out due to hallucination, the chance of encountering errors at the macro-scale is going to be practically guaranteed.

Everyone will say, "but you want tool-use or code-gen for this anyway". Sure! But carry-digits or similar is just one version of "correct matters" and putting some non-local kinds of demands on attention, plus it's easier to check than code. So tool-use or code-gen is just pushing the same problem somewhere else to hide it.. there's still a lot of steps involved, and each one really has to be correct if the macro-layer is going to be correct and the whole thing is going to be hands-off / actually automated. Maybe that's why local-models can still barely handle nontrivial tool-calling.

kovek · 2025-10-24T20:48:29 1761338909

Well, if the model can reliably keep in context CPU cache plus CPU registers plus CPU instructions and is able to do operations based on those, then we pretty much solved computation using LLMs, right? It could use RAG to operate on RAM and SSD.

Here we can see the amount of data a high end traditional non-SOC CPU holds:

> For a recent high-end non-SoC desktop CPU: > Cache: ~40-100 MB total (L1 + L2 + shared L3) > Register files: tens to few hundreds of KB total across cores (e.g., ~200-300 KB or so) > Combined: So you're looking at ~40-100 MB + ~0.2 MB → roughly ~40-100 MB of total on-chip caches + registers.

I'm sure we can reduce these caches to fit in the context windows of today's LLMs (~500,000 tokens).

Then, with temperature 0 we get more "discrete" operations. Now, we still have the rare problem of hallucinations, but it should be small with temperature 0.

lossolo · 2025-10-24T21:10:10 1761340210

It doesn't work like mapping CPU caches/registers into an LLM context. Transformers have no mutable registers, they attend over past tokens and can't update prior state. RAG isn't RAM. Even with huge context, you still can't step CPU style instructions without an external, read/write memory/tooling.

And temperature 0 makes outputs deterministic, not magically correct.

photonthug · 2025-10-24T21:23:23 1761341003

> And temperature 0 makes outputs deterministic, not magically correct.

For reasons I don't claim to really understand, I don't think it even makes them deterministic. Floating point something something? I'm not sure temperature even has a static technical definition or implementation everywhere at this point. I've been ignoring temperature and using nucleus sampling anywhere that's exposed and it seems to work better.

Random but typical example.. pydantic-ai has a caveat that doesn't reference any particular model: "Note that even with temperature of 0.0, the results will not be fully deterministic". And of course this is just the very bottom layer of model-config and in a system of diverse agents using different frameworks and models, it's even worse.

astrange · 2025-10-25T06:03:41 1761372221

It's partly because floating point math is not associative and GPU inference doesn't guarantee all the steps will be done in the same order.

razodactyl · 2025-10-26T13:33:45 1761485625

Well mostly but they can generate more state that can push old state out of context.

If an LLM were sufficiently trained to be able to roll-forward and correctly set the current state of some registers written into the conversation..? I wouldn't trust it though, leaves too much to chance.

I too make mistakes trying to keep track of things, I end up using tools too.

kovek · 2025-10-24T21:29:44 1761341384

Well, the LLM may re-infer the whole state fully on every instruction. Temperature 0 is deterministic and that's what we are looking for. If the model is trained properly on how the CPU state + instructions should be handled, then it should be able to produce the next state.

lossolo · 2025-10-24T22:42:10 1761345730

With temp = 0 if the model is off by one bit at step k, all subsequent steps are deterministically wrong.

Your previous example shows the best case, which is a model can sometimes follow a textual recipe for long multiplication on short inputs. That's not the same as learning a length generalizing bit exact algorithm.

Basically what you shown is the model can describe the algorithm. It doesn't show it can execute it at scale. Without writable state and bit exact ops, errors grow with length and "focus more" only slows that failure, it doesn’t eliminate it.

kovek · 2025-10-25T08:01:16 1761379276

> It doesn't show it can execute it at scale. Without writable state and bit exact ops,

Well, modern LLM coding agent products (eg. Claude Code) are able to store state in files in the current repository. So, you could have the model keep the "CPU State", and the files in the repository be the "RAM".

Also, could this https://arxiv.org/html/2402.17764v1 possibly reduce errors when doing inference? There is no floating point operations

razodactyl · 2025-10-26T13:38:35 1761485915

It seems to be the conclusion that we come to though, we ourselves use tools.

The focus here is the LLM being able to do it unaided.

The space of all combinations of steps is so large for many problems that require precision and usually one incorrect step breaks everything. "I forgot to carry the 1".

Even then, while brilliant, Claude does screw up sometimes - we're not there yet but it doesn't prevent it from being adequately useful.

photonthug · 2025-10-23T22:59:56 1761260396

Glad to see the pentagonal multiplication prism is just as weird as the addition helix https://arxiv.org/abs/2502.00873

daxfohl · 2025-10-23T23:56:51 1761263811

Yeah, I have to imagine animal brains are just giant Fourier transform engines under the hood, and humans brains have just evolved to make some frequencies more precise.

photonthug · 2025-10-23T21:53:57 1761256437

Thanks for doing this. OpenAI is not in fact open, so referencing their claims as obviously true on anything else is just a non-starter. Counterpoint though, it's been a while since I've run this kind of experiment locally, so I started one too. For reasoning I only have qwen3:latest and I won't clutter the thread with the output, but it's complete junk.

To summarize, with large numbers it goes nuts trying to find a trick or shortcut. After I cut off dead-ends in several trials, it always eventually considers long form addition, then ultimately rejects it as "tedious" and starts looking for "patterns". Wait, let me use the standard multiplication algorithm step by step, oh that's a lot of steps, break it down into parts. Let me think. Over ~45 minutes of thinking (I'm on CPU), but it basically cannot follow one strategy long enough to complete the work even if landed on a sensible approach.

For multiplying two-digit numbers, it does better. Starts using the "manual way", messes up certain steps, then gets the right answer for sub-problems anyway because obviously those are memoized somewhere. But at least once, it got the correct answer with the correct approach.

I think this raises the question, if you were to double the size of your input numbers and let the more powerful local model answer, could it still perform the process? Does that stop working for any reason at some point before the context window overflows?

photonthug · 2025-10-23T20:39:15 1761251955

Agree, this stuff was trending up very fast before AI.

Could be my own changing perspective, but what I think is interesting is how the signal it sends keeps changing. At first, emoji-heavy was actually kind of positive: maybe the project doesn't need a webpage, but you took some time and interest in your README.md. Then it was negative: having emoji's became a strong indicator that the whole README was going to be very low information density, more emotive than referential[1] (which is fine for bloggery but not for technical writing).

Now there's no signal, but you also can't say it's exactly neutral. Emojis in docs will alienate some readers, maybe due to association with commercial stuff and marketing where it's pretty normalized. But skipping emojis alienates other readers, who might be smart and serious, but nevertheless are the type that would prefer WATCHME.youtube instead of README.md. There's probably something about all this that's related to "costly signaling"[2].

[1] https://en.wikipedia.org/wiki/Jakobson%27s_functions_of_lang... [2] https://en.wikipedia.org/wiki/Costly_signaling_theory_in_evo...

quintu5 · 2025-10-23T21:54:26 1761256466

There’s a pattern to emoji use in docs, especially when combined with one or more other common LLM-generated documentation patterns, that makes it plainly obvious that you’re about to read slop.

Even when I create the first draft of a project’s README with an LLM, part of the final pass is removing those slop-associated patterns to clarify to the reader that they’re not reading unfiltered LLM output.

photonthug · 2025-10-23T14:52:24 1761231144

Seems like an optimistic read on things. This is the kind of common-sense approach you would expect in a world without lawyers, just observing that collusion is bad because the effects are bad, and digging into the details of the causes are completely irrelevant for the public/plaintiff because it's really just on the company to fix the undesirable result.

IANAL but if realpages outcomes were definitive or reasonably generalized results dealing with the core issue, then similar arguments against e.g. Amazon would be a slam dunk. AFAIK, actual case outcome just hinges on details about "nonpublic data" and similar. Not remotely on bad effects for consumers or anything like that. Since printing realpages database in the newspaper would not actually help apartment-hunters, then this just tells landlords and third party markets how to do price-fixing legally next time? Most likely algorithmic pricing, surveillance pricing, etc is still coming to your grocery store after the issue is "settled" for property rental, or at least settled for realpages, in certain jurisdictions, for now.

skeezyjefferson · 2025-10-23T15:19:32 1761232772

> AFAIK, actual case outcome just hinges on details about "nonpublic data" and similar.

that sounds like insider trading. price fixing would need not involve nonpublic information (beyond the actual conspiracy to fix the prices as it helps to keep that part secret normally)

photonthug · 2025-10-23T17:12:28 1761239548

> “Settling Defendants have agreed not to provide nonpublic data to RealPage for use in competitor pricing recommendations and to refrain from using RealPage’s RMS that relies on non-public competitor data to make pricing recommendations,” attorneys wrote in the settlement filing.

https://www.multifamilydive.com/news/realpage-class-action-l...

I agree that "nonpublic" is barely related to the problem so how it's related to a solution is unclear. But it seems like this is the only general aspect of the outcome. Otherwise the outcome is just to stop doing this specific bad thing this specific time, and fines that are less than the profit made from bad behaviour.

photonthug · 2025-10-23T13:38:58 1761226738

> Modern price collusion is more apt to happen with A/B testing if prices at locations to see what the local market will bear.

One of my first thoughts as well. If you're big enough, you collect so much data and run so many experiments all the time that you know exactly what you'd do if/when there's any competitor on the scene. Not only is there no need to talk to them and make backroom deals, but barely any need to even observe them. You priced like they did/would/could at some point already anyway. At a certain scale and if you already know the price that the market can tolerate.. the most relevant hidden information you want to know is how much cash your competitor has access to. That tells you whether you can win the price-war to sell at a loss for long enough to ruin them, buy them, move on to integrating verticals etc.

Game theory is interesting but also a bad model to the extent that it assumes persistent players with changing strategies, whereas average case in late-stage capitalism is more likely to have players eating players, no new players can enter, players changing rules, etc. As a CS nerd I still like a game theoretical approach better than most econ, but at some point we need to give up on tidy formulas and closed-form answers, and go all in on messy simulations.

photonthug · 2025-10-20T20:22:08 1760991728

> feels like the wrong question to me

I agree but had different questions. TFA mentions the consideration of whether failure cases are correlated, but of course if OpenAI wins big, there's a good chance this directly or indirectly creates much instability and uncertainty in many other loans/partners. What's the EV on whether that is net-positive considering this is a loan at 5% and not an investment?

On the other side, if OpenAI crashes hard, is it really such a sure thing that Microsoft will be the on the hook to pay off their debts? Setting aside whatever the lawyers could argue about in a post-mortem, are they even obligated to keep their current stake / can they not just divest / sell / otherwise cut their losses if the writing is on the wall?

photonthug · 2025-10-19T15:07:46 1760886466

If we fill those abandoned buildings with people, air-conditioning the inside of the building for them will obviously add even more heat to the outside? Parking lots that are full of cars aren't going to be that much cooler than empty ones?

Basically the real story is just that trees make shade (yes, we know already) and "vacant or abandoned" isn't much involved (yes, but we want to discuss zoning/taxes/urbanism things)

acdha · 2025-10-19T15:14:08 1760886848

There are complex trade offs there: housing uses more power than a parking lot but it also provides far more significant social goods, housing can be built with very different levels of energy usage and external heat emissions, and while people need housing they don’t need cars the same way so you can offset a substantial fraction of the pollution from housing by reducing the number of cars used by residents.

The main lesson I draw is that everything would improve by taxing externalities: the land is vacant because the property owners doesn’t have enough incentive to do something useful with it and we have a lot of inefficiency in our housing and transportation which a carbon tax would go a long way towards reducing.

fuzzfactor · 2025-10-19T15:28:59 1760887739

>the land is vacant because [of some imaginary occurrences]

Texas is bigger than that.

They have always taxed more carbon and more land in ways that make them rich as hell, at the average citizen's expense.

To the envy of other states' greedy taxing entities.

That wasn't so bad when there was still enough widespread prosperity for the average citizen to be able to afford it.

The land is vacant after they tore down the buildings because the taxes were already too high, and rising too fast.

No brag, just fact.

"How far up is the river now, Ma?"

"Six feet deep, and rising . . ."

parineum · 2025-10-20T07:27:44 1760945264

a white roof and a green lawn is going to reflect a lot more light than is emitted by an efficient AC.

fuzzfactor · 2025-10-19T15:18:03 1760887083

>air-conditioning the inside of the building for them will obviously add even more heat to the outside?

Roger?

Well, if Roger's not here somebody's going to have to do the thermodynamics their own self, and it's good to take the initiative plus show it can be done wihtout scaring anybody by using equations or any of that complicated stuff :)

lukan · 2025-10-19T15:40:45 1760888445

Is that a sort of joke?

If not, you cannot make the land cold with air condition. You can just move heat around, with AC from the inside to the outside, but that costs extra energy -> more heat

fuzzfactor · 2025-10-19T17:51:16 1760896276

>Is that a sort of joke?

Yes!

But only if your name is Roger :)

>you cannot make the land cold with air condition. You can just move heat around, with AC from the inside to the outside, but that costs extra energy -> more heat

Which is exactly what I've been saying since I was a teenager.

According to thermodynamics anyway . . .

LamaOfRuin · 2025-10-19T21:11:37 1760908297

This is still just moving the heat around, but with metamaterials you can now passively convert the heat energy into wavelengths that do not get absorbed by the atmosphere and beam a decent chunk of it back into space.

quinndexter · 2025-10-20T08:54:14 1760950454

I would like to know more.

LamaOfRuin · 2025-10-20T17:30:05 1760981405

https://en.wikipedia.org/wiki/Passive_daytime_radiative_cool...