If you're parsing JSON or other serialization formats, I expect that can be nontrivial. Yes, it's dominated by the llm call, but serializing is pure CPU.
Also, ideally your lightweight client logic can run on a small device/server with bounded memory usage. If OpenAI spins up a server for each codex query, the size of that server matters (at scale/cost) so shaving off mb of overhead is worthwhile.
The comparison is: for your use case, which is better? Decompress and matrix sum or grab keys and increment individual values?
There's some flexibility for how one would do that second increment, but of course naively it might look like constructing a sparse matrix and summing so that feels pretty similar. But the flexibility might be nice in some cases, and bloom filters are an extremely space efficient representation with arbitrarily low false positive rate
My understanding for the original OpenAI and anthropic labels was essentially: gpt2 was 100x more compute than gpt1. Same for 2 to 3. Same for 3 to 4. Thus, gpt 4.5 was 10x more compute^
If anthropic is doing the same thing, then 3.5 would be 10x more compute vs 3. 3.7 might be 3x more than 3.5. and 4 might be another ~3x.
^ I think this maybe involves words like "effective compute", so yeah it might not be a full pretrain but it might be! If you used 10x more compute that could mean doubling the amount used on pretraining and then using 8x compute on post or some other distribution
All three authors are large contributors to the field (the book _discrete and computational geometry_ by O'Rourke & Devadoss is excellent), Demaine has some origami in the collection at MoMA NYC^, Mitchell found a ptas for euclidian tsp (Google it - the paper is readable and there is another good write up of his vs Arora's)
This line of thought was exacerbated by that one paper that was then parroted (hah!) by every influencer / negativist in the space. It didn't matter that the paper was badly executed, their setup was flawed and that it got rendered moot by the existence of LLama3 models. People still quote that, or the "articles" stemming from it.
The junior engineers on my team are just supercharged and their job is different from when I was a junior engineer.
I would say: ten years ago there was a huge shortage of engineers. Today, there is still lots of coding to be done, but everyone is writing code much faster and driven people learn to code to solve their own problems
Part of the reason it was so easy to get hired as a junior ten years ago was because there was so much to do. There will still be demand for engineers for a little while and then it's possible we will all be creating fairly bespoke apps and I'm not sure old me would call what future me does "programming".
For all of the skepticism I've seen of Sam Altman, listening to interviews with him (eg by Ben Thompson) he says he really does not want to create an ad tier for OpenAI.
Even if you take him at his word, incentives are hard to ignore (and advertising is a very powerful business model when your goal is to create something that reaches everyone)
how many times must we repeat that AGI is whatever will sell the project. it means nothing. even philosophers don't have a good definition of "intelligence"
AGI just refers roughly to the intelligence it would take to replace most if not all white collar workers. There is no precise definition, but it's not meaningless.
Isn't this already the case? Perhaps you mean in a non-transient fashion, i.e. internalizes the in-context learning into the model itself, sort of an ongoing training, that isn't sort of a "hack" like writing notes or adding to a RAG database or whatever.
The flip side of this is meta having a hack that keeps their GPUs busy so that the power draw is more stable during llm training (eg don't want a huge power drop when synchronizing batches)
I think the issue is exactly the spikiness because of how AC electricity works (whereas if the data centered were DC - eg wired through a battery - it wouldn't be an issue)
I expect you're right that GPU data centers are a particularly extreme example
I tried desperately to source this even before seeing your request.
My current guess is that I heard it on a podcast (either a Dwarkesh interview or an episode of something else - maybe transistor radio? - featuring Dylan Patel).
I'll try to re listen to top candidates in the next two weeks (a little behind on current episodes because I'm near the end of an audiobook) and will try to ping back if I find it.
If too long has elapsed, update your profile so I can find out how to message you!
Also, ideally your lightweight client logic can run on a small device/server with bounded memory usage. If OpenAI spins up a server for each codex query, the size of that server matters (at scale/cost) so shaving off mb of overhead is worthwhile.
reply