Hacker Newsnew | past | comments | ask | show | jobs | submit | petesergeant's commentslogin

> It begs the question why you assume the parent comment was going to blindly follow the LLMs output.

Many people do


Yeah, I believe all their claims about speed, but I just don’t do anything that my M2 doesn’t seem to find effortless…

> how many developers out there are merely happy with getting something to work and get it out the door

There's a very large number of cases where that's the right choice for the business.


Also for small cli tools and scripts that otherwise wouldn't get written.

> If they’re really so confident on the LLM’s effectiveness, why not just keep it voluntary, why force it on people?

For people who are so confident (which, I'm not), it's an obvious step; developers who don't want to use it must either be luddites or afraid it'll take their jobs. Moving sales people to digital CRMs from paper files, moving accountants to accounting software from paper ledgers and journals, moving weavers to power looms, etc etc -- there would have been enthusiasts and holdouts at every step.

The PE-bro who's currently boasting to his friends that all code at a portfolio has to be written first with Claude Code and developers are just there to catch the very rare error would have been boasting to his friends about replacing his whole development team with a team that cost 1/10 the price in Noida.

Coding agents can't replace developers _right now_, and it's unclear whether scaling the current approach will allow them to at any point, but at some point (and maybe that's not until we get true AGI) they will be able to replace a substantial chunk of the developer workforce, but a significant chunk of developers will be highly resistant to it. The people you're complaining about are simply too early.


It tracks with the trend of computing being something you passively consume rather than something you do. Don't learn how anything works! Deskill yourself! Not that LLMs aren't a force multiplier.

> All I want from politicians, and by this I mean literally all I want at this point, is my politicians to be smarter than me

... why? Ted Cruz is almost certainly smarter than almost all of us, and I do not want Ted Cruz to be a politician. Boris Johnson is exceptionally gifted, and Never Again. Rishi Sunak's as sharp a guy as you're likely to meet, but as the Economist noted, rarely met a bad idea he didn't warm to. You're giving a weird halo effect to intelligence.


Ted Cruz said that Galileo was persecuted because he claimed that the earth isn’t flat, and used that as justification about denying climate change. This is a lie at best, but more likely just idiocy because he never paid attention in history classes.

I do not agree that Ted Cruz is smarter than nearly all of us.

I guess I just want politicians who can make the most basic logical inferences and do the most rudimentary reasoning, and importantly it would be great to have politicians who don’t think that they already know everything.


> I do not agree that Ted Cruz is smarter than nearly all of us

One of his Harvard Law Professors called him “off-the-charts brilliant”, and he won several national level debate challenges, so I suspect we’re working off such significantly different world views here as to preclude any reasonable discussion on this point.


Two thoughts:

* Korean are tall by East Asian standards; 3-4 cm taller than Chinese and Japanese

* Thais don't eat that much, but they will massively over-cater, and there's not really the same taboo as in Europe of food wastage. My father, who like me spent a couple of decades in Thailand (although at different times) reckoned it was because historically they've had very few food shortages compared to other countries


Taller than the Chinese average, perhaps, but northern Chinese are generally much taller than southern Chinese. Guess what's next to northeastern China? That's right, Korea.

Thais don't have big meals, but they do snack incessantly, which makes up for it. And overcatering for guests is a pan-Asian or arguably a global phenomenon.


> And overcatering for guests is a pan-Asian or arguably a global phenomenon.

Sure, but Thais will go to a restaurant as a family and order 3x the amount of food needed. Somehow, all the rice will get eaten, and some of the meat will be left.


I'm working on a set of TypeScript libraries to make it really really easy to spin up an agent, or an chatbot, or pretty much anything else you want to prototype. It's based around sensible interfaces, and while batteries are included, they're also meant to be removed when you've got something you want.

The idea is that a beginner should be able to wire up a personally useful agent (like a file-finder for your computer) in ten minutes by writing a simple prompt, some simple tools, and running it. Easy to plugin any kind of tracing, etc you want. Have three or four projects in prod which I'll be switching to use it just to make sure it fits all those use-cases.

But I want to be able to go from someone saying "can we build an agent to" to having the PoC done in a few minutes. Everything else I've looked at so far seems limited, or complicated, or insufficiently hackable for niche use-cases. Or, worse of all, in Python.


> Built on top of Together Turbo Speculator, ATLAS reaches up to 500 TPS on DeepSeek-V3.1 and up to 460 TPS on Kimi-K2 in a fully adapted scenario — 2.65x faster than standard decoding, outperforming even specialized hardware like Groq

and yet, if you click on: https://openrouter.ai/moonshotai/kimi-k2-0905

You'll see Groq averaging 1,086tps vs Together doing 59tps. Groq and Cerebras often feel like the only games in town. I'd love that to be different (because I'd like more models!), but nobody else is coming close right now.

Comparing how quickly gpt-oss-120b runs gives a broader picture: https://openrouter.ai/openai/gpt-oss-120b -- Vertex (Google) and SambaNova do pretty good on it too, but still, the difference between a top provider and an also-ran is giant.

God I love OpenRouter.


> I'd love that to be different (because I'd like more models!), but nobody else is coming close right now.

I'm currently on the Cerebras Code subscription for like 50 USD a month because it more or less makes the rate limits I used to deal with other platforms disappear (without making me spend upwards of 100 USD paying per token): https://www.cerebras.ai/blog/introducing-cerebras-code

At the same time, their Qwen Coder 480B model is fine but I still find myself going for Claude or GPT-5 or Gemini 2.5 Pro for more complex issues (or ones where I need good usage of Latvian language), at least for programming tasks it'd eventually be super cool if they could offer more models.

Or have some sort of a partnership with Anthropic or whoever, because getting my questions answered at around 500-1500 TPS is really, really pleasant, especially for agentic use cases with code modifications, even if I still bump into the 128k context limits occasionally.


Interesting, if you take a look at the median throughput chart [0], groq goes insane after 7th Oct. Wonder what happened.

[0] https://openrouter.ai/moonshotai/kimi-k2-0905/performance


2x jump overnight. new LPU hardware? I checked the speed for groq's gpt-oss-120B, Llama4-maverick, and Llama4-scout; none of them had a noticeable change this month

Heavy quantization

They claim (or someone on Reddit who claims to be staff claims) that's not accurate: https://www.reddit.com/r/LocalLLaMA/comments/1mk4kt0/comment...

There's another angle to this comparison. Groq and Cerebras use custom chips, but I'm not sure about Together. In this case, Together is sharing results based on the B200 GPU. Another important point is the accuracy of these speed-ups compared to the baseline model. It's known that such tricks reduce accuracy, but by how much? Kimi has already benchmarked several providers. https://x.com/Kimi_Moonshot/status/1976926483319763130

> It's known that such tricks reduce accuracy

AFAIU, speculative decoding (and this fancier version of spec. decoding) does not reduce accuracy.


No it shouldn't do. "All" you're doing is having a small model run the prompt and then have the large model "verify" it. When the large model diverges from the small one, you restart the process again.

It’s quantization which is crippling accuracy…

People all over this subthread saying that with no evidence provided. The company say they don’t — which would be pretty embarrassing to have to walk back — so who’s saying they do?

> Groq and Cerebras use custom chips

Not just custom chips, but custom chips which derive much of their performance from enormous amounts of SRAM. There's no denying that approach is fast, but it's also incredibly expensive, and SRAM scaling has slowed to a crawl so it won't get much cheaper any time soon.


This is an "expensive for whom" question. I'd be keen to know if they're burning investor money hosting these right now or if they're able to run these at cost.

> You'll see Groq averaging 1,086tps

What I don't understand is, Groq reporting 200tps for the same model: https://console.groq.com/docs/model/moonshotai/kimi-k2-instr...

OpenRouter numbers look fishy.


Wonder if it’s prompt caching? OpenRouter is (I guess) just reporting actual throughput, where presumably groq is reporting a from-scratch figure? Just a guess tho.

groq is quantizing, even though it's not labeled as such on openrouter (super frustrating)

Do you have a source for that? They are pretty close to the ref implementation on moonshot’s ranking


But Groq/Cerebras are hardware accelerators. It's an unrelated optimization. I wouldn't be surprised if they could also use speculators (today or in the future).

>Groq and Cerebras often feel like the only games in town.

SambaNova should be similar...they've got a similar specialized hardware approach


Do these numbers compare performance at the same cost?

You can see the cost in the links, and the answer is “pretty much” for the consumer. The backend maths, no idea.

Vite has been a joy to use. Very interested in an all-in-one solution from that team.

While true, I'm not sure I've seen an LLM define a cost function and then try and reduce the cost yet, which I am guessing is what the OP is referring to.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: