More

tomrod · 2026-04-24T22:43:38 1777070618

Basket goods, basically.

Price of good i x Quantity of good i. Quantity is fixed year to year. So a loaf of bread, a gallon of milk, a TV, etc.

Sum those up across a reasonably representative basket, then compare that sum to the same quantity and new prices in a future year.

sum(P_i_new year x Q_i) / sum(P_i base year x Q_i) - 1 --> change in CPI

Hamburgers might be more expensive, but TVs, toilet paper, and dog kibble might not be.

peterbecich · 2026-04-24T22:59:43 1777071583

Agreed completely. Other examples: long-distance telephone minutes, shoes, clothing, air travel... probably all cheaper.

tomrod · 2026-04-24T20:34:21 1777062861

I'm not up to date, I think. How so?

michelb · 2026-04-25T06:19:36 1777097976

There's been a spat between some people on X, about how few engineers inside Google want to work with Gemini, given that it apparently is not great with code, and they would rather use Claude.

This same sentiment is there within Deepmind, except they have more power it seems. Perhaps Google is hedging their bet?

Best non-X link I could find: https://benzatine.com/news-room/internal-strife-at-google-th...

tomrod · 2026-04-24T20:33:07 1777062787

Does the EO actually hold any weight to that designation, or is that decided by the US Congress?

tomrod · 2026-04-24T20:31:51 1777062711

It was enough for me to dig much deeper into OpenAI, where before we almost exclusively used them for services with any form of SLA.

ordinaryradical · 2026-04-24T20:42:03 1777063323

You're saying it was a turning point for you to get more embedded with them? Way to be killer robot positive, I guess...

tomrod · 2026-04-24T21:35:57 1777066557

Good call out because I was a little unclear.

Opposite of what you said. The "dig" was not retrenching to more use, but rather I evaluated what I saw them doing and have migrated our company to much better options.

tomrod · 2026-04-23T00:18:04 1776903484

I had them. They are wonderful, and even creamy.

I still love tiny red bananas though, they are so sweet!

tomrod · 2026-04-22T15:58:16 1776873496

Teams that do this need to just dogfood internally. Once you start collecting telemetry on external users defaulted to opt-in you're not a good faith actor in the ecosystem.

tomrod · 2026-04-21T22:49:56 1776811796

AI loopidity rearing it's head. Just send the bullet points that we all want anyway, right?! Stop sending globs of text and other generated content!

tomrod · 2026-04-21T02:40:35 1776739235

Extraordinary claims! I don't follow the argument though.

EGreg · 2026-04-21T02:55:24 1776740124

Author here. Since starting to teach AI at IENYC, I started publishing my papers recently on arXiv, and considering submitting them to a journal.

This is based on my original "PLT" paper: Probablistic Language Tries (https://news.ycombinator.com/item?id=47743585). A "Trie" is basically a tree of prefixes. While working on https://safebots.ai I became obsessed with caching generated artifacts as a means to do a lot of things: extremely cheap inference, near-optimal compression, modeling decision trees for strategies, and so on.

The PLT model was about compression in general. My main insight there was that the LLM's own weights actually contain an incredibly detailed probability distribution of "the next token" in any sequence, which can therefore be very useful to supercharge statistical compression. Sequences which occur frequently in the domain of the model receive short codes. The other insight is that if we allowed lossy compression, we could compress well below the Shannon information limit, and just have an "overflow" bag for surprising sequences.

When TurboQuant came out, I realized we can also go way below the Shannon limit in the same way, and take advantage of PLT. In fact, I'm working on publishing a paper that generalizes this to robotics (which needs to do cheap fast on-board inference "in the field"). I also believe this is how animals actually learn. In other words, over time they learn overall "sequences" of actions and then can check whether they are "good enough" to solve the problem, or whether to switch to a full analysis -- this corresponds to System 1 and 2 of Daniel Kahneman's "Thinking Fast and Slow".

If you want more specific information, or see the code for a working prototype, you can write me at the email in the paper.

mbernstein · 2026-04-21T03:26:17 1776741977

This is a compute memory trade, not compression vs. turobquant? Lemma 1 is something like, "forward pass is deterministic because it's deterministic" which means the input tokens were always the lower bound...which isn't caching? Smells tautological. What am I missing?

EGreg · 2026-04-21T03:34:09 1776742449

Well yeah, I just wrote it as a lemma, but it's basically close to tautological. Its only job is to formally ground the entropy argument that follows it. The interesting claim is what comes after: because KV vectors are deterministic functions of tokens, and because the model is a near-optimal predictor of its own distribution, the conditional entropy of each new KV vector given all previous ones is bounded by token-level perplexity. TurboQuant compresses against the marginal distribution of each vector in isolation -- that's the gap.

And yes, it's a compute/memory tradeoff, all caching is. The claim is just that the memory floor is much lower than anyone had formally established. Whether the compute cost of getting there is worth it is a fair open question the paper doesn't settle. But what if it is? Caching is the thread running through most of my work, and I intend to find out.

himata4113 · 2026-04-21T03:19:35 1776741575

The reasoning around the 900000x claim isn't sound and violates way too many information density principles.

I was incredibly curious since I had a pet theory in my mind about something extremely similar, but arrived at a conclusion that the time complexity of such cache would end up being extremely slow.

This is like saying that you've achieved single token compression when you're passing a single token into a model and letting it regenerate the entire output since at the end of the day models are probabilistic stateless devices. At that point you don't have a cache and are just replaying the tokens or have a caching algorithm with a complexity similar to that of a model defeating the purpose of such cache.

I've never considered that arXiv had a problem, now I do.

EGreg · 2026-04-21T03:27:50 1776742070

No, the 914,000x in the paper is talking about the ratio between two entropy floors, it's not a claim about practical compression. The point is that per-vector quantization has been chasing the wrong theoretical limit: the sequential entropy bound is just fundamentally lower, by that factor, because KV vectors aren't independent samples!

On complexity, that's fair concern, and the paper doesn't fully resolve it. But the analogy to "replaying tokens through the model" isn't exactly right. The delta coding layer uses the model's own next-token prediction, which is already happening during normal autoregressive inference. You're not adding a forward pass, you're using the one already running and storing only the residual, which is much smaller than the raw vector -- precisely because the model is a good predictor of its own next state.

The trie index lookup is O(sequence length), not O(model forward pass). Whether that's fast enough in practice at scale is actually a legitimate open question and I'd be the first to admit the paper doesn't settle it. But the contribution here is simply establishing that the bound exists and is dramatically lower than what the field has been targeting. That's what I wanted to put out. The engineering question of how close you can get is the natural next step.

Your pet theory about time complexity sounds interesting actually, did you write it up anywhere?

tomrod · 2026-04-22T16:09:25 1776874165

Can you show a working example/implementation of these theoretical improvements? Working code would also go far for replication.

usernametaken29 · 2026-04-21T03:13:07 1776741187

Kahnemans book is considered outdated by modern neuroscience.

stingraycharles · 2026-04-21T03:07:59 1776740879

Dropping a grand theory of animal cognition into a defense of a KV cache compression bound is not something I was anticipating. I don’t think it’s a great argument.

wholinator2 · 2026-04-21T03:19:07 1776741547

At least some random pseudocrackpotery like that is points in the direction of it being a human. There's some strange human tendencies that AI just doesn't usually replicate

Rekindle8090 · 2026-04-21T03:00:50 1776740450

[flagged]

cristoperb · 2026-04-21T03:09:15 1776740955

I can't speak for the person you're replying too, but I use -- for emdash for two reasons: I never remember how to type an actual emdash in linux/X11, and more importantly, I do most of my writing in Asciidoc which converts -- to an emdash automatically. It's nothing to do with bot detection or whatever.

But it does get me confused sometimes because in LaTeX (and other markup languages) -- gets converted to an endash whereas it takes three hyphens --- to make an emdash.

rhet0rica · 2026-04-21T03:21:38 1776741698

you are hereby sentenced by the council of dashers to type "—" ten million times using Windows-1252 alt codes

you have 5 seconds to comply before your planet will be demolished to make room for a giant space-typographer's punctuation case

EGreg · 2026-04-21T03:01:43 1776740503

Haha, yes I always used -- when I typed an em-dash manually. What bot detection extensions? :-P

gaze · 2026-04-21T03:03:43 1776740623

this paper looks AI generated to me... I mean, there's no experiments to go along with it.

tomrod · 2026-04-19T15:53:34 1776614014

I never thought about these as different acquisition models. What others are there to learn about?

phrotoma · 2026-04-19T17:14:14 1776618854

The last time I mentioned this I got downvoted into a crater, so maybe ppl hate it (I'm open to hearing counterpoints!) but there's an army of tech freelancers swapping advice in a slack called "Rands Leadership Slack". My old boss suggested it. I thought it was BS. It was surprisingly informative - case in point it's where I first heard of the above podcast.

tomrod · 2026-04-18T12:56:33 1776516993

Does the cost scale linearly/superlinearly? What does the $300-$400 price data point tell us with relationship to the parameter density?

No gotchas here. I genuinely don't know that 8B parameters is in a zone with significant decreasing marginal returns -- too far out of my knowledge area but genuinely curious.

avidiax · 2026-04-18T13:15:46 1776518146

Die size increases cost exponentially, by decreasing chips per wafer and decreasing yield.

I expect that this kind of burned-in model is also very difficult to verify (how do you know if some of the weights are off), and not amenable to partial disablement to increase yield. For CPUs, you just laser disable bad cores. Can't forego part of a neural net.

robkop · 2026-04-19T12:37:39 1776602259

You can ablate surprisingly large chunks of a model with near to no effect, you can try this easily - download an open weight model in torch.

Obviously it’s not ideal but you could likely have single digit % of all weights affected and still have a useful model (many caveats here: e.g. locality of damaged weights matters, distribution of errors matters, fail high/low matters, …)

hdndjsbbs · 2026-04-18T19:30:31 1776540631

I mean, you probably can just turn off defective parts of the network. You better believe if this becomes popular they would salvage yields by selling "dumber" chips at a discount.

vrighter · 2026-04-19T04:50:05 1776574205

except that if you do, you've just implemented a different model, with no way to tell which part of it is wrong

hdndjsbbs · 2026-04-26T12:53:08 1777207988

Could you tell that the original model was "right"?