Hacker News new | past | comments | ask | show | jobs | submit | wrsh07's comments login

If you're parsing JSON or other serialization formats, I expect that can be nontrivial. Yes, it's dominated by the llm call, but serializing is pure CPU.

Also, ideally your lightweight client logic can run on a small device/server with bounded memory usage. If OpenAI spins up a server for each codex query, the size of that server matters (at scale/cost) so shaving off mb of overhead is worthwhile.


Good error messages which are great for humans but also great for LLMs that are trying to debug their initial attempt

What about this which says the opposite:

https://news.ycombinator.com/item?id=44149809


The comment you replied to is talking about Rust compiler's errors.

The comment you linked is talking about unspecified application's runtime errors.


The comparison is: for your use case, which is better? Decompress and matrix sum or grab keys and increment individual values?

There's some flexibility for how one would do that second increment, but of course naively it might look like constructing a sparse matrix and summing so that feels pretty similar. But the flexibility might be nice in some cases, and bloom filters are an extremely space efficient representation with arbitrarily low false positive rate


My understanding for the original OpenAI and anthropic labels was essentially: gpt2 was 100x more compute than gpt1. Same for 2 to 3. Same for 3 to 4. Thus, gpt 4.5 was 10x more compute^

If anthropic is doing the same thing, then 3.5 would be 10x more compute vs 3. 3.7 might be 3x more than 3.5. and 4 might be another ~3x.

^ I think this maybe involves words like "effective compute", so yeah it might not be a full pretrain but it might be! If you used 10x more compute that could mean doubling the amount used on pretraining and then using 8x compute on post or some other distribution


beyond 4 thats no longer true - marketing took over from the research

Oh shoot I thought that still applied to 4.5 just in a more "effective compute" way (not 100x more parameters, but 100x more compute in training)

But alas, it's not like 3nm fab means the literal thing either. Marketing always dominates (and not necessarily in a way that adds clarity)


Does anyone know if this is still up-to-date?

All three authors are large contributors to the field (the book _discrete and computational geometry_ by O'Rourke & Devadoss is excellent), Demaine has some origami in the collection at MoMA NYC^, Mitchell found a ptas for euclidian tsp (Google it - the paper is readable and there is another good write up of his vs Arora's)

^ https://erikdemaine.org/curved/MoMA/


Definitely out of date, e.g. the 3SUM subquadratic conjecture (probably 11) has been solved and improved on [1].

If it's not been already there's immediate application, e.g. problem 41.

[1]:https://link.springer.com/article/10.1007/s00453-015-0079-6


Thanks for the pointer, I just saw the 2nd edition of Discrete and Computational Geometry will be coming out in July: https://www.amazon.com/-/en/Discrete-Computational-Geometry-... (I preordered a copy)


Last update to mark a problem as solved was in December last year: https://github.com/edemaine/topp/pull/10


You just need to file a simple PR to mark a problem as solved though. I cannot get much simplier


This whole line of thought is sort of funny. Yes you can try training a model on synthetic data in such a way that it experiences model collapse

That doesn't mean there aren't ways to train a model incorporating synthetic data without seeing model collapse


> This whole line of thought is sort of funny.

This line of thought was exacerbated by that one paper that was then parroted (hah!) by every influencer / negativist in the space. It didn't matter that the paper was badly executed, their setup was flawed and that it got rendered moot by the existence of LLama3 models. People still quote that, or the "articles" stemming from it.


The junior engineers on my team are just supercharged and their job is different from when I was a junior engineer.

I would say: ten years ago there was a huge shortage of engineers. Today, there is still lots of coding to be done, but everyone is writing code much faster and driven people learn to code to solve their own problems

Part of the reason it was so easy to get hired as a junior ten years ago was because there was so much to do. There will still be demand for engineers for a little while and then it's possible we will all be creating fairly bespoke apps and I'm not sure old me would call what future me does "programming".


For all of the skepticism I've seen of Sam Altman, listening to interviews with him (eg by Ben Thompson) he says he really does not want to create an ad tier for OpenAI.

Even if you take him at his word, incentives are hard to ignore (and advertising is a very powerful business model when your goal is to create something that reaches everyone)


I don't agree with Tyler on this point (although o3 really is a thing to behold)

But reasonable people could argue that we've achieved AGI (not artificial super intelligence)

https://marginalrevolution.com/marginalrevolution/2025/04/o3...

Fwiw, Sam Altman will have already seen the next models they're planning to release


The goalposts seem to have shifted to a point where the "AGI" label will only be retroactively applied to an AI that was able to develop ASI


how many times must we repeat that AGI is whatever will sell the project. it means nothing. even philosophers don't have a good definition of "intelligence"


AGI just refers roughly to the intelligence it would take to replace most if not all white collar workers. There is no precise definition, but it's not meaningless.


They still can't reliably do what humans can do across our attributes. That's what AGI was originally about. They have become quite capable, though.


An AGI would be able to learn things while you are talking to it, for example.


Isn't this already the case? Perhaps you mean in a non-transient fashion, i.e. internalizes the in-context learning into the model itself, sort of an ongoing training, that isn't sort of a "hack" like writing notes or adding to a RAG database or whatever.


The flip side of this is meta having a hack that keeps their GPUs busy so that the power draw is more stable during llm training (eg don't want a huge power drop when synchronizing batches)


I thought of this too. Iirc it was bigger problem because surging spikes in power and cooling were harder and more costly to account for.

I'm not au fait with network data centres though, how similar are they in terms of their demands?


I think the issue is exactly the spikiness because of how AC electricity works (whereas if the data centered were DC - eg wired through a battery - it wouldn't be an issue)

I expect you're right that GPU data centers are a particularly extreme example


Have a link to an article talking about them doing this?


I tried desperately to source this even before seeing your request.

My current guess is that I heard it on a podcast (either a Dwarkesh interview or an episode of something else - maybe transistor radio? - featuring Dylan Patel).

I'll try to re listen to top candidates in the next two weeks (a little behind on current episodes because I'm near the end of an audiobook) and will try to ping back if I find it.

If too long has elapsed, update your profile so I can find out how to message you!


That goes against separation of concerns. A separate utility must be created for that specific purpose, not hidden in some other part of the system


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: