Hacker Newsnew | past | comments | ask | show | jobs | submit | gpt5's commentslogin

Source? I didn’t know EU officials are required to use burner phones in the US


Anonymous sources, the EC didn't confirm anything, and the "proof" is well they didn't deny it either.

Quality reporting as usual from El Reg.



They definitely should. The US is a profoundly untrustworthy country.

I wish it was true. I would gladly use a GPT 5.2 high model equivalent for coding (6 months old) if it was offered cheaper by Deepseek or Kimi. And I'm sure that's an extremely prevalent opinion by the millions of Claude and Codex users who are bothered by the costs.

However, they just don't perform that well in practice. That's the real issue. You can actually see it when you move away from open benchmarks. Deep seek 3.2 is 4% on Arc-AGI 2 [1], while GPT 5.2 high is 52% and GPT 5.5 pro high is 84.6%. That's the real reason why nobody is using these models for serious work. It's incredibly frustrating.

In addition, I already feel the pain myself on the model restriction. I'll asking my codex 5.5 agent to crawl a website - BOOM, cybersecurity warning on my account. I'll ask it to fix SSH on my local network - another warning. I'm worried about the day my account would be randomly banned and I cannot create a new one. OpenAI already asks you to perform full identification in order to eliminate these warnings - probably exactly for that - so that if they ban you, it's permanent.

[1] https://arcprize.org/leaderboard


I worked extensively on ARC AGI before and one thing is SURE as hell. OpenAI and Gemini in particular use this as marketing material. You can correlate the benchmark release with stock price increase. They feed synthetic datasets of ARC into their models to boost the numbers. There is no doubt in my mind Gemini is no better than DeepSeek other than being specifically fine tuned for ARC AGI. Heck, they even say so and they say they have paid annotations for ARC. Again, economic incentives. In terms of whether these models are actually better at the benchmarks, likely not. See ARC 3, where the gap is diminishingly small.

I've also worked extensively on ARC AGI 1/2, and I mainly agree. Marketing and training. Performance of LLMs on ARC is most importantly a function of training on grid/table-like data. It doesn't have to be specifically synthetic ARC data though. Training an LLM to be better at perceiving grid-like arrangements of data in a spatial way like an image, rather than just tabular, is hugely useful for things outside of ARC benchmarks, though it's a narrow skill. Hence, I'm sure they do it. I want them to do that. I believe the labs when they say they didn't train specifically for ARC-AGI 1/2 (where did Google say otherwise? I don't see it). But it does not mean the models are getting better at general purpose reasoning. They were already plenty good enough at that. You can describe ARC images in words and reason about it using a level of intelligence LLMs have had for years: they're designed to be easy! LLMs just couldn't reason about image-like grids very well.

ARC-AGI isn't perfect, but it helps demonstrates the gap. I'm sure all companies optimize their models for this benchmark given its dominance.

What about other benchmarks? Benchmarks where the contents are freely available have become useless for evaluating models.

Why do you think DeepSeek isn't also fine tuned on ARC AGI? Maybe they're more fine tuned on ARC AGI but still get worse scores. There's no way to know.

My gut feeling is that ARC doesn’t play as big of a role in the Chinese model manufacturer landscape. It’s one byproduct but China is focusing on resource efficiency (for political reasons and low compute). So unlike OpenAI, poor performance on ARC doesn’t hurt as much if the model works well. OpenAI literally hinges on hype so the insane economic bets they make somehow pay off. If you have billions and the future of the company on the line, you ace the exam any way you can. We noticed this early on that whenever some dataset of ARC was released suddenly the classes of problems in that dataset GPT would do well on. But it just doesn’t generalise. They fine tune like crazy. I bet they fine tune for raspberry counting at this point. Again, for OpenAI the perception of moat is everything! Keep that in mind

True, ARC is mostly an artificial "human-like AGI" benchmark that doesn't really reflect any plausible workload. Very different from things like Humanity's Last Exam that reflect real-world knowledge and are now getting closer and closer to saturation even with open models.

> Deep seek 3.2 is 4% on Arc-AGI 2

Why are you bringing up an outdated Chinese model from 6 months ago to compare to a US model from 6 months ago? The outdated Chinese model will have performance from ~12 months ago, obviously. But today's Chinese model DeepSeek 4 has performance not far from the US model 6 months ago; 46% compared to 52% from 5.2.


Because Deepseek 4.0 is not yet there, but the jump isn't expected to be large. Kimi 2.5 is there and is also scoring low.

Deepseek V4 came out three weeks ago: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

Kimi K2.5 has also been superseded by a finer tuned Kimi K2.6 three weeks ago. Moonshot's Kimi models appear to be the favored Chinese model, at least for coding, and not Deepseek V4. z.AI's GLM 5.1 is also worth mentioning as rather competent for coding, also released in April.

Those models too will not be beating US AI labs by your metrics (although for coding, Kimi K2.6 might beat the very uneven Gemini depending on the situation), but in your critism at least consider the state of the art in your comparisons.


I have been using Deepseek v4 pro for personal projects and home infra related work for last couple of weeks. It's quality of work is not bad at all, it is fairly fast and given the fraction of the cost compared to Claude, I can keep going which makes it a very compelling option. Looking forward to trying out Kimi 2.6, thanks for the recommendation.

Also they have a pretty big token discount running this month: https://api-docs.deepseek.com/quick_start/pricing/

Even without the discount, I'll have to think about whether I need the 100 EUR tier of Anthropic Max, or whether downgrading to Pro and using DeepSeek is good enough. And they're also up on OpenRouter and other places.

Been using those models, not quite comparable with Opus 4.6/4.7 but with max reasoning, pretty good for a variety of dev tasks! Only big problem is no ability to process images, so can't really do browser use for some semi-automated testing, I'd have to write Playwright tests even when I don't want to.


I've been using OpenCode Go ($10/month) for personal projects (I have Claude subscription for $DAYJOB) and for the tinkering around that I do for myself the quality of the open weight models and the limits of the OpenCode plan are sufficient. I agree that for a lot of dev tasks they're quite good!

I've been using Deepseek 4 Pro (instead of Sonnet 4.6) as the developer LLM (Opus is the planner) and it's been great. Not super fast, with all the reasoning, but has been writing good code, and I think I paid $5 so far (whereas with Sonnet I'd have run out of the weekly limits on Max for weeks now).

Definitely recommended, though it's crucial that you have GPT 5.5 review the code afterwards.


Hum, I'm using it [0] with my Ollama Cloud subscription since the last two weeks and I love it. Never reached the 5 hours usage limits of the $20 plan (on side projects) where I would reach it sometimes in ONE prompt with Opus.

[0]: https://ollama.com/library/deepseek-v4-pro


I 100% agree with you, but I've been convinced over the last year that it's a time and scale issue, not anything fundamental.

The Chinese models right now are in a weird spot. Compared to the frontiers, both their pre and post training is woeful - tiny, resource constrained in every dimension including human, slow. I'd compare it to OpenAI 5 years ago except I think even then OpenAI had way more!

But they "cheat" quite a lot in distillation and very benchmark-focussed RL and that's where you get this superficial quality in the leaderboards that doesn't match up when you go off-script. Arc is a great example in that it really belies an "inferior soul" at the heart of it all.

What gives me great hope though is that those same scaling laws that Altman and others have been hyping forever will absolutely kick in for the Chinese labs just as they did for the US ones, and I don't think anything can stop that process now. So they will catch up. It won't be tomorrow, but it's not going to be 10 years either. 3-5 would be my reasonably educated guess.

And the final risk, that China itself might try to restrict availability of the tsunami of GPU or other AI hardware it will inevitably produce - well, I just can't really imagine a country that has been configuring itself for the last 40 years as a single purpose export machine deciding that actually, no, it doesn't want to export something.

About the model restrictions - absolutely. I've been trying to do security research on my own software and the frontier models immediately get suspicious. I've been playing with the local ones much more this year basically because of this. They have deficiencies, for sure - they feel very "hollow" compared to the major labs. But I've talked to a lot of people, and the consensus is pretty clear - just a matter of time.


Just an observation: constraints often result in creative solutions. I wouldn't be surprised if a smaller lab makes a big breakthrough because they have to.

> I'd compare it to OpenAI 5 years ago except I think even then OpenAI had way more!

Say what? 5 years ago OpenAI had received around $139 million in funding, and they’d just come out with GPT3 with 175B parameters, a 2048 context window, trained on 300B tokens on a 10,000 V100 cluster which would have cost maybe $4-13 million at the time for their training run.

Meanwhile Deepseek V3’s famously frugal training was $5M, and Chinese AI companies are raising billions in funding. Sure American AI companies are raising tens (and maybe hundreds in the case of OpenAI, if you count their circular funding rounds) of billions but they’re grossly inefficient, and we’ve already hit the limits of the scaling laws where there’s little point in increasing the number of parameters of a model.


> Meanwhile Deepseek V3’s famously frugal training was $5M

And widely derided once the team was unable to provide receipts. It’s more likely to be 10x


Why make up things? The papers are published completely and apples to apples compares 5M final training run against grok 3.5 (400M)final training run.

Oh, it was written in a paper, must be correct then, no further investigation required just believe it at face value! No track record of academic dishonestly, and definitely no incentives to fudge the numbers.

Have you tried the latest DeepSeek v4 Pro inside of the Claude Code harness? It's not listed in that site.

It definitely 'feels like' it is as good as Claude for many regular web app coding tasks (though I don't have real benchmarks). And it is comically cheap.

I'm not suggesting it is better than the latest Claude or codex models, but it seems 'good enough' for a lot of use cases in my limited real world testing.


I'm starting to feel like a parrot, but people seem to forget that software engineering is actually a very narrow slice of the white collar pie. You don't need a mega-model which can reason about 100 000 lines of code when you want to create a nice PPT (which consumed literally hours of your life before) to impress your boss. SOTA models will probably be used for frontier research, complex coding tasks, large scale data analysis, etc. And the average Joe shall be able to buy a pre-configured box with a plug-and-play harness and run medium models air-gapped. Or use such models through cloud APIs dirt cheap if privacy is not a concern.

On the same topic but from a slightly different angle - as SOTA models get more capable, the 'quality' and 'feel' of the experience they provide in each domain is heavily dependent on the reinforcement learning the vendor does for that specific domain. After all, many fields have 100 flavors of "good answers," but the model has to pick one answer.

Benchmarks are not very good at capturing this yet. But it could be the case that DeepSeek v4 Pro is 100% as good as Claude Opus 4.7 at scaffolding a basic Rails app, but absolutely terrible at creating a credible business plan that another businessperson would think is real. That's a made-up example, but you get the point.

The end result will be a lot of people arguing about which model is "better," but "better" depends heavily on the task and how that model was trained to interact with the user for that task. Two users may have very different qualitative experiences using the exact same model, despite the benchmarks.


Creating a nice PPT is actually hard because it requires visual capabilities and so-called "computer use" (really, GUI use) of fiddly proprietary software. The nice thing about the coding case compared to a lot of disparate white-collar work is that it's all plain ASCII text. You can already ask a coding model to create a nice TeX/beamer slideshow (or whatever the Typst-based equivalent is) but whether your boss will be duly impressed by that is anyone's guess.

This is a tangent but I'd also mention sli.dev -- slideshow-as-website is really great and fun to make with llms

Tangential, but in our opinion corporate PPTX automation is an unsolved problem, even with Claude for PowerPoint (and it's worse with everything else common out there). Its harness (a) is not tuned very well for corporate use and (b) even if it were, fails to manage the specific business knowledge within each org needed to create effective (i.e. audience tailored) presentations.

I've just written a blog post about this topic this week: https://octigen.com/blog/posts/2026-05-11-ai-presentation-ga...


Also so many developers i know use LLMs for one shoting isolated problems, explainers, discussions and planning. For these even Kimi is pretty great.

I don't think every dev will be comfortable just releasing claude on their project.


They're not even that much cheaper (1/2 price per task according to Artificial Analysis) once you account for lower token usage of GPT-5.5. I can't justify it when factoring in the extra time wasted, and the cheap codex usage I get through the monthly plan. Frontier intelligence is not a commodity product ... yet.

The price per task already factors in token usage so you're double counting if you're also tacking "higher token usage" as another argument on top

Arc has no predictive power whatsoever. I always use the best models available. So far I haven't found a task that chineses models cannot solve very quickly and reasonably. Do you have any examples where they failed for you?

And yet Claude six months ago was amazing and good enough for you.

This shows that AI cloud consumption is just a conspicuous consumption status symbol, nobody knows why they need cloud AI or what problem they are even solving.


Ah, AI is running off of the highway model, induced demand. That kind of makes a lot of sense now that I think about it.

If you want something close to claude, use glm 5.1 with claude code. Their subscription price is no longer x10 times cheaper now though (at best 2 times cheaper)

All research points to a "no" answer - weight is regained, and quickly. Which helps explain why obesity is so prevalent - it is something in the brain's chemistry.

Weight would only be regained if you start eating more, no? I would think that would be hard to do if you've already seen what appropriate portions are.

You seem to have a fundamental misunderstanding of what GLP-1 agonists help with. In simple terms, they make you less hungry. If you stop the drugs, it's not surprising you go back to being hungry. It would be a miracle drug if you didn't.

People, on average, eat until they're no longer hungry. Problem is, there's only a loose relationship between your caloric needs and your hunger response. That's how you end up with underweight people who are trying to put on muscle saying they can't possibly eat any more and still can't put on weight, while having overweight people who eat twice as much as that guy and have to actively choose not eat more. Both people can make a conscious choice to disobey their signals, just like how you can choose to hold your hand to a hot stove. But it takes a lot of energy to keep up that willpower. Effective weightloss drugs solve that problem, by treating the actual problem: the hunger.


I don't misunderstand. I understand they stop you from feeling as hungry. That is why it is even more perplexing. Eating until you are no longer hungry isn't how most people eat I'd say. Most people eat a given quantity of food. A plate of food. A bowl of soup. An entree with the provided side perhaps. People don't generally order food, eat it, and order more food. Maybe they do I guess, but I haven't seen it personally. I mean I think a lot of people could shove a dozen hotdogs down their gullet if they wanted to, but that reaction isn't a typical expectation. Plus once you've seen normal portions, surely you'd realize when you are going beyond those.

Speed of eating might also be an underrated factor in all this. Stretching out ones meals and slowing down the pace might lead to satiety triggers coming before the meal is done, whereas if one scarfs down the plate before that signal happens, well, one already scarfed down the plate and might be working into the next before those signals hit. This meta analysis suggests this is a possibility (1).

1. https://pmc.ncbi.nlm.nih.gov/articles/PMC8156274/


It's possible that your internal hunger mechanism is different from other people's. In my experience, hunger leads me to eat, and I stop eating when I feel full. Perhaps I will go back for seconds if I am eating at home, or, if I am eating at a restaurant, I will likely not eat everything on my plate (giant US portions).

> I would think that would be hard to do if you've already seen what appropriate portions are.

This would be true if not knowing what an appropriate portion size is was the one thing keeping most people from losing weight. If that was the case, traditional dieting would have a far better track record with long term weight loss.


People notoriously don't know what an appropriate portion size is. Usually those failed diets come from failing to appreciate the quantity of calories coming in. Those sugary drinks and snacks add up fast. I've seen it among people I know. I might drink water, they opt to drink 250 calories. Does that make them feel any full? Probably not, its merely sugar water, but it accounts for calories and can kill diets. We order a 750 calorie entree and they get a refill. We walk away from the same meal but one of us had almost twice as many calories.

Nope, the body will adjust to regain it no matter what.

Australia's health organization did a meta study on people who had bariatric surgery. They found that every single one regained 70% of their original weight after five years, even though they were physically incapable of eating the way they did before.

This happened to my grandmother, she had a bypass in the 2000s, lost over a hundred pounds, and then regained it again and was back to her original weight when she passed in 2022. The woman couldn't eat more than 4 ounces per meal without throwing up.

I lost 40 pounds in 2017 from gastritis. I kept it off for three years, and then regained 50 pounds despite starting ozempic.


Except if your body is unnaturally screaming at you to eat more. The obesity epidemic is not caused by ignorance or lack of willpower. It's natural differences in how people's bodies work compounded by modernity's changes to physical activity levels and diet.

People do learn to fast and deal with those signals. Once you are aware of what they are it probably gets easier. I've fasted before, by choice and via circumstance in less fortunate times in my life. Is it uncomfortable? Sure. But it is no punch in the face. It is something you can learn to push aside and handle the task. One can learn to go to sleep hungry, sadly.

And you mention exercise, that is an excellent point. Hunter gatherers might forage for 8 miles a day, while many peoples daily walking effort can be measured in a few dozen feet. Our bodies are built to be used. It is no surprise that when they are not, systems designed for a certain baseline load are no longer functioning as intended.


Gosh that's so impressive, if only they had a peptide that made everyone exactly like you!

(they do they're fucking called glp-1s)


Scroll down to the leaderboard - https://arcprize.org/leaderboard

Spoiler alert - they are all towards the bottom of the leaderboard. People come up with a wide variety of excuses for why they are not used despite being offered for significantly lower cost, but the answer is simply because they don't perform well enough for now.


There isn't even deepseek V4.

I'd rather trust LLM arena leaderboard, which puts it on par with sonnet.


LM Arena uses human side by side voting, which limits its applicability to complex tasks.

The ARCPrize leaderboard does have Deepseek V3.2, which only scored 4% on ARC-AGI 2 (while the top models score over 80%). It also Kimi and Qwen, but they also didn't perform well.


Why is everything today has to be "good" or "bad". Where is the nuance? Where is seeing things as they are - an exciting endeavor built by thousands of people, one of them has flaws you don't like.

The rise of moralization of everything is really killing online discourse. It's gotten to the point where people will now mostly criticize and support ideas based on who proposed them, and not based on their merits. Tribalism at its worst.


My theory is that tribalism is hard coded in our brain, strongly selected for by those bad times in the past, where the ability to turn off emotion and critical thoughts meant you, a generally social creature, could murder your fellow man, to keep your family/in group alive/fed.

I think religion helped reduce tribalism, at a societal level, by making evil/demons/bad acts as the "them" and everyone that went to church on sunday (it was the whole town previously) was the "us". Now, without religion, and the physical/social bringing together it brought, that hardware in our brain still tries to segment a clear "us"/"them", but with much less guidance.


People who themselves eschew nuance should not be surprised when they and everything they touch are polarized into "good" and "bad" buckets. I'm pretty neutral to most companies on earth, because their CEOs wisely don't make wild comments every other day on their personal politics.

This isn't a new thing, ideas and actions have always been judged by who says them. If anything, the difference is that in the past, his behavior would have gotten him thrown out both from his companies and out of polite society.

This seems like less of a today thing and more of an ancient human tendency.

A lot of Buddhist practice is basically trying to train against immediately collapsing reality into self/other, right/wrong, craving/aversion.

Practicing this with Elon Musk is effectively ultra hard mode.

--

Though I do think there’s a subtle irony here too — the original commenter may simply be describing their own emotional reaction/disillusionment, while your response risks collapsing them into "part of the problem."

Feels like everybody in the thread is pointing at the same tendency from different angles.


I hoped to get across that I still find this to be a nuanced issue. I like the content, I just dislike the discourse around it, which makes it hard for me to get excited about the content.

I too would like it to just be about the content, but nothing exists in a vacuum.


As a European my problem is that any additional success by Musks means more support for far right extremists that want to destroy the EU. Being against that is not moralizing or Tribalism.

Well, Musk illegally wrecked half the federal government and killed tens of thousands of Africans in the process. Now he spends his days boosting and funding white nationalists and far-right politicians around the world. Why does everything have to be "good" or "bad"? Because some things are just pure evil and need to be called out as such, as well as thoroughly boycotted if the wheels of justice are too slow to turn.

This is not a nuanced case of "he did a few icky things, but also lots of good things." No. He is a fucked up, deeply racist megalomaniac who is doing his best to reshape the Western world in his fetid image. If he stopped with Tesla and SpaceX, maybe he would be penned differently in the history books, but alas.


If you replace "online" with "modern", then your comment could be an impassioned 1940s-era defense of Nazi Germany for their "merits" in face of their flaws.

The sum of these merits adds up to something. SpaceX is a political venture, and just like the uncomfortable questions that Microsoft/Google/Apple all pose, it's worth asking what the consequences will be in the long term. Lawful intercept sounded like a great plan, before it was leveraged by America's adversaries in Salt Typhoon as a prepackaged surveillance network.


Musk is not just "one of them"; the financial success of SpaceX is extremely unevenly distributed.

Personally I am looking forward to the post-IPO world where a lot of very smart people with hard-won knowledge will have their golden handcuffs off.


>people will now mostly criticize and support ideas based on who proposed them, and not based on their merits.

"People" were always like that and will be so..stupid. Let me quote Agent K from MIB for you.

> A person is smart. People are dumb, panicky, dangerous animals and you know it...

The funny thing is that these are the same people who applauded obvious scams because Musk proposed it when they liked him...



This is about successful use of the EU funds, not GDP or any other metric.

Why is AMD not more popular then if labs are so flexibly with giving away CUDA?


people are trying, especially for inference. For training, it’s just too high risk to tank your training I think.

TPUs are at least dogfooded by Google deepmind, no team AFAIK has gotten the AMD stack to train well.


Interesting. Why? My current mental model is that AMD chips are just a bit behind, so, less efficient, but no biggie. Do labs even use CUDA?


This is somewhat out of date (Dec 2024), but gives you some idea of how far behind AMD was then: https://newsletter.semianalysis.com/p/mi300x-vs-h100-vs-h200...

Pull quotes:

AMD’s software experience is riddled with bugs rendering out of the box training with AMD is impossible. We were hopeful that AMD could emerge as a strong competitor to NVIDIA in training workloads, but, as of today, this is unfortunately not the case. The CUDA moat has yet to be crossed by AMD due to AMD’s weaker-than-expected software Quality Assurance (QA) culture and its challenging out of the box experience.

[snip]

> The only reason we have been able to get AMD performance within 75% of H100/H200 performance is because we have been supported by multiple teams at AMD in fixing numerous AMD software bugs. To get AMD to a usable state with somewhat reasonable performance, a giant ~60 command Dockerfile that builds dependencies from source, hand crafted by an AMD principal engineer, was specifically provided for us

[snip]

> AMD hipBLASLt/rocBLAS’s heuristic model picks the wrong algorithm for most shapes out of the box, which is why so much time-consuming tuning is required by the end user.

etc etc. The whole thing is worth reading.

I'm sure it has (and will continue to) improved since then. I hear good things about the Lemonade team (although I think that is mostly inference?)

But the NVidia stack has improved too.


That’s insane. There should be a big team of people at AMD whose whole job is just to dogfood their stuff for training like this. Speaking of which, Amazon is in the same boat, I’m constantly surprised that Amazon is not treating improving Inferentia/Trainium software as an uber-priority. (I work at Amazon)


> There should be a big team of people at AMD whose whole job is just to dogfood their stuff

if they had this management attitude, they wouldn't have been so far behind so as to need this action in the first place!


I'll just leave this here from 10 years ago:

> “Are we afraid of our competitors? No, we’re completely unafraid of our competitors,” said Taylor. “For the most part, because—in the case of Nvidia—they don’t appear to care that much about VR. And in the case of the dollars spent on R&D, they seem to be very happy doing stuff in the car industry, and long may that continue—good luck to them.

https://arstechnica.com/gadgets/2016/04/amd-focusing-on-vr-m...

"car industry" is linked to the GPU-accelerated self-driving car work, ie, making neural networks run fast on GPUs: https://arstechnica.com/gadgets/2016/01/nvidia-outs-pascal-g...


Where's the scope for an L7 promo in "Fixed a bunch of tiny issues that were making it hard to use Tranium/Inferentia with PyTorch"?

Amazon's compensation strategy, in which you primarily get a raise years in the future for tricking your management chain into promoting you is definitely bearing its rotten fruit.


Hardware companies being terrible at software is the norm. Nvidia is one of the rare companies that can successfully execute both.

Maybe Amazon is an example how this happens even to hardware divisions within software/logistics companies


How are their Linux drivers looking these days? Still a PITA to install?


I mean the fact there isn’t even today may speak to why AMD isn’t the contender it should be by this point.


Anecdotal but over several years with an AMD GPU in my desktop I've tried multiple times to do real AI work and given up every time with the AMD stack.


Im running fine on my AMD 7800xt 16gb... Yes memory is a bit limited, but apart from the i have found that it works great using Vulcan in LM studio for example.

ROCm works great too, the only issue i have had is that my machine froze a couple of times as it used 100% of the graphics and the OS had nothing left. Since moving to vulcan i stopped getting these errors apart from a little UI slowdown when i had 4 models loaded at the same time taking turns.

Im also on a i7 6700 with 32gb DDR4 so im sure that is causing more slowdowns then the graphics card.


Yet another reason to doubt claims that ”software is solved”.

Anthropic did retire an interview take-home assignment involving optimising inference on exotic hardware, because Claude could one shot a solution, but that was clearly a whiteboard hypothetical instead of a real system with warts, issues and nuance.


i'm doing inference on a free mi300x instance from AMD right now. not sure if the software stack is just old or what, but here's what i've observed: stuck on an old version of vllm pre-Transformers 5 support. it lacks MoE support for qwen3 models. oss-120b is faaaar slower than it should be.

int8 quantization seems like it's almost supported, but not quite. speeds drop to a fraction of full precision speed and the server seems like it intermittently hangs. int4 quantization not supported. fp8 quantization not supported.

again, maybe AMD is just being lazy with what they've provided, but it's not a great look.

right now the fastest smart model i can run is full precision qwen3-32b. with 120 parallel requests (short context) i'm getting PP @ 4500 tokens/sec and TG @ 1300 tokens/sec


amd gpus compete but they lack the interconnect. NVLink performance is a huge deal for training.


> Do labs even use CUDA?

From the papers I've read and the labs that I have worked in personally, I would say that most scientists developing Deep learning solutions use CUDA for GPU acceleration


What I hear is that getting your network to work on AMD is a huge pain.


Yeah, historically it’s been software that’s limited AMD here. Not surprised to hear that may still be the issue. NVidia’s biggest edge was really CUDA.


CUDA is a complete and utter piece of shit software. It's just that it is a tiny bit less of a shitshow than the alternatives.


I don’t know what’s a chicken and what’s an egg here. But ROCm support is often missing or experimental even in very basic foundational libraries. They need someone else to double down on using their chips and just break the software support out of the limbo.


This is what I've heard on the "street". Building a CUDA-compatible stack for AMD's hardware requires highly-paid SWEs. It's a very niche field, and talent is hard to come by.

But AMD does not want to pay these specialized SWEs the market rate. Their existing SWEs would be up in arms saying, basically, "what are we, chopped liver??", or so the thinking goes.

So AMD is stuck with a shitty software stack which cannot compete with CUDA.

If I were making such decisions, I would just cull the number of existing SWEs down by 50%, and double the pay for remaining ones. And then go out and hire some top talent to build a good software stack.


> highly-laid SWEs

Freudian slip?


Ha! You caught it before I did; and I caught it right away.


Political polarization create tribalism, where people align their view with their tribe, and justify an increasingly more escalatory means to fight the "other side".


Other potential macro-contributing factors may include: breakdown in local community, removal of community forums for discussion, attention economy and tabloid journalism gravitating toward emotional reaction (TikTok) rather than intellectual dialogue (balanced journalism), social media echo chambers, removal of accessible popular education, defunding of public media, unaffordable public access to medicine, credit culture, increasingly unaffordable costs of living and abnormally performative political dioramas. The net result are people, unable to reason about the world around them, drawn in to emotional us-and-them with a dialogue of echo-chamber reinforcement, who decide semi-rationally to "chuck it all in" the second things get out of control financially, psychologically or emotionally. In other words, the modern world has built a perfect breeding ground for recruitment to extremism. <s>Great time to start a cult.</s>

... and in a classic example, apparently the mere mention of concern regarding the rise in US political violence got this thread flagged. Where can you have a discussion anymore?


It got flagged because the people who are pro-violence flag any comments that disagree with them, so they get hidden.


Fair theory but how do you know that?


Especially since America is happier than most European countries [1]. And the ones that are happier are the Nordics and Ireland which are more suburban and less dense.

[1] https://data.worldhappiness.report/table


I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me.

https://postimg.cc/wyxgCgNY


Nice, enjoyed the image as someone who has been to the events. But also easy raccoon placement :)


mmmm yummy OSLS?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: