More

redlock · 2025-09-11T09:53:34 1757584414

Easier to debug deterministic inference

redlock · 2025-07-19T12:13:10 1752927190

Nope

https://x.com/polynoamial/status/1946478249187377206?s=46&t=...

AIPedant · 2025-07-19T12:36:28 1752928588

If you don't have a Twitter account then x.com links are useless, use a mirror: https://xcancel.com/polynoamial/status/1946478249187377206

Anyway, that doesn't refute my point, it's just PR from a weaselly and dishonest company. I didn't say it was "IMO-specific" but the output strongly suggests specialized tooling and training, and they said this was an experimental LLM that wouldn't be released. I strongly suspect they basically attached their version of AlphaProof to ChatGPT.

Davidzheng · 2025-07-19T13:00:09 1752930009

We can only go off their word unfortunately and they say no formal math. so I assume it's being eval'd by a verifier model instead of a formal system. There's actually some hints of this b/c geometry in Lean is not that well developed so unless they also built their own system it's hard to do it formally (though their P2 proof is by coordinate bash (computation by algebra instead of geometric construction) so it's hard to tell.

skdixhxbsb · 2025-07-19T13:45:15 1752932715

> We can only go off their word

We’re talking about Sam Altman’s company here. The same company that started out as a non profit claiming they wanted to better the world.

Suggesting they should be given the benefit of the doubt is dishonest at this point.

aluminum96 · 2025-07-19T17:12:34 1752945154

“they must be lying because I personally dislike them”

This is why HN threads about AI have become exhausting to read

nosianu · 2025-07-19T18:36:10 1752950170

In general I agree with you, but I see the point of requiring proof for statements made by them, instead of accepting them at face value. In those cases, given previous experiences and considering that they benefit from making them, if they are believed, the burden of proof should be on those making these statements, not on those questioning them, no?

Those models seem to be special and not part of their normal product line, as is pointed out in the comments here. I would assume that in that case they indeed had the purpose of passing these tests in mind when creating them. Or was it created for something different, and completely by chance they discovered they could be used for the challenge, unintentionally?

otabdeveloper4 · 2025-07-19T18:30:17 1752949817

Yeah, that's how the concept of "reputation" works.

queenkjuul · 2025-07-19T23:48:35 1752968915

No, they are likely lying, because they have huge incentives to lie

dandanua · 2025-07-19T20:00:03 1752955203

You don't need specialized tooling like Lean if you have enough training data with statements written in the natural language, I suppose. But the use of AlphaProof/AlphaGeometry type of learning is almost certain. And I'm sure they have spent a lot of compute to produce solutions, $10k is not a problem for them.

The bigger question is - why should everyone be excited by this? If they don't plan to share anything related to this AI model back to humanity.

redlock · 2025-05-30T06:35:31 1748586931

Really? Mansplain? Why bring gender wars terms into this.

redlock · 2025-04-05T18:50:04 1743879004

Doesn't OpenRouter ranking include pricing?

Not really a good measure of quality or performance but of cost effectiveness

mbesto · 2025-04-05T19:38:34 1743881914

I mean it literally says on the page:

"Shown are the sum of prompt and completion tokens per model, normalized using the GPT-4 tokenizer."

Also, it ranks the use of Llama that is provided by cloud providers (for example, AWS Lamda).

I get that OpenRouter is imperfect but its a good proxy to objectively make a claim that an LLM is "the weakest ever"

redlock · 2025-04-04T19:33:46 1743795226

I wouldn't call the inflation that ravaged the world and real estate prices beyond the means of the middle class "saving".

His quantitative easing and zero interest rate policy only saved asset managers and corporate balance sheets.

epistasis · 2025-04-04T20:25:45 1743798345

Look back at expectations at the beginning, and you'll see that nearly everybody was predicting massive recession and even hyperinflation. Balaji bet $1M that there would be hyperinflation:

https://news.ycombinator.com/item?id=35201728

Yet that didn't happen. We dodged a major bullet, and survived far better than the rest of the world. We must look back at predictions and outcomes with clear eyes, not with the narratives that are being sold in the current day.

redlock · 2025-03-29T03:34:05 1743219245

Noam Shazeer is not any guy. And I would bet the latest jump in Gemini capability is a result of him coming back.

alecco · 2025-03-29T08:39:34 1743237574

On one hand, Pichai paid 2.7bn to get 1 guy back. On the other hand, Pichai laid off 200 Core devs and "relocated roles" to India and Mexico [1]. The duality of Pichai-style management.

[1] https://www.cnbc.com/2024/05/01/google-cuts-hundreds-of-core...

redlock · 2025-03-10T18:52:25 1741632745

This is my observation as well. Recent market gains doesn't reflect how inflation eroded most people purchasing power.

People today are worse off than before Covid even though the market is much higher.

Debt fueled growth (through quantitative easing and deficit spending) is not healthy and always has a bad ending.

Real economic growth is what was seen between 1950 and 1970 where purchasing power increased and most of the gains went to the middle class.

redlock · 2025-03-10T15:23:56 1741620236

Moore's law is exponential

jimbokun · 2025-03-10T19:40:41 1741635641

redlock · 2025-02-15T12:50:10 1739623810

Parent poster did say they are aware they don’t know but can’t express it.

I am guessing he is referring to mechanistic interpretability research like these:

https://arxiv.org/abs/2405.16908

https://arxiv.org/abs/2407.03282

You are claiming they are statistical parrots, which I don’t think the parent poster meant.

The “statistical parrots” argument might have been compelling with GPT-3, but not with today’s models and the results of mechanistic interpretability research, which show internal representations and rudimentary world models.

redlock · 2025-01-27T21:06:36 1738011996

The issue here is that, even with a lot of VRAM, you may be able to run the model, but with a large context, it will still be too slow. (For example, running LLaMA 70B with a 30k+ context prompt takes minutes to process.)