Hacker Newsnew | past | comments | ask | show | jobs | submit | HumanOstrich's commentslogin

You can tell it not to do that and it will show inline diffs.

I tried Cerebras with GLM-4.7 (not Flash) yesterday using paid API credits ($10). They have rate limits per-minute and it counts cached tokens against it so you'll get limited in the first few seconds of every minute, then you have to wait the rest of the minute. So they're "fast" at 1000 tok/sec - but not really for practical usage. You effectively get <50 tok/sec with rate limits and being penalized for cached tokens.

They also charge full price for the same cached tokens on every request/response, so I burned through $4 for 1 relatively simple coding task - would've cost <$0.50 using GPT-5.2-Codex or any other model besides Opus and maybe Sonnet that supports caching. And it would've been much faster.


I hope cerebras figures out a way to be worth the premium - seeing two pages of written content output in the literal blink of an eye is magical.

The pay-per-use API sucks. If you end up on the $50/mo plan, it's better, with caveats:

1 million tokens per minute, 24 million tokens per day. BUT: cached tokens count full, so if you have 100,000 tokens of context you can burn a minute of tokens in a few requests.


It’s wild that cached tokens count full - what’s in it for you to care about caching at all then? Is the processing speed gain significant?

Not really worth it, in general. It does reduce latency a little. In practice, you do have a continuing context, though, so you end up using it whether you care or not.

Try a nano-gpt subscription. Not going to be as fast as cerebras obviously but it's $8/mo for 60,000 requests

I wonder why they chose per minute? That method of rate limiting would seem to defeat their entire value proposition.

In general, with per minute rate limiting you limit load spikes, and load spikes are what you pay for: they force you to ramp up your capacity, and usually you are then slow to ramp down to avoid paying the ramp up cost too many times. A VM might boot relatively fast, but loading a large model into GPU memory takes time.

I use GLM 4.7 with DeepInfra.com and it's extremely reasonable, though maybe a bit on the slower side. But faster than DeepSeek 3.2 and about the same quality.

It's even cheaper to just use it through z.ai themselves I think.


I know this might not be the most effective use case but I had ended up using the try AI feature in cerebras which opens up a window in browser

Yes, it has some restrictions as well but it still works for free. I have a private repository where I ended up creating a puppeteer instance where I can just input something in a cli and then get output in cli back as well.

With current agents. I don't see how I cannot just expand that with a cheap model like (think minimax2.1 is pretty good for agents) and get the agent to write the files and do the things and a loop.

I think the repository might have gotten deleted after I resetted my old system or similar but I can look out for it if this interests you.

Cerebras is such a good company. I talked to their CEO on discord once and have following it for >1-2 years now. I hope that they don't get enshittified with openAI deal recently & they improve their developer experience because people wish to pay them but now I had to do a shenanigan which was for free (but also its just that I was curious about how puppeteer works so I wanted to find if such idea was possible itself or not & I really didn't use it that much after building it)


Looking at it through a religious lens is pretty narrow-minded. Secular people have values too. You're limiting your ability to understand the world around you.

I would reckon looking at these kinds of things through religious lenses is actually VERY useful.

I don't follow sportsball, but there are masses of population and massive institutions that are built upon for and on sportsball.

So, seeing large changes or shifts within sportsball can be useful in gleaning some sort of trend.

While, I don't fully follow the gp comment, I can see the other side of yours.


It's more like following astrology - entirely irrelevant to reality.

Your comment is entirely irrelevant to most of the human beings on this planet.

And yet, you took the time to type it out. And will even spend some time defending it, proposing it.

Narrow worldviews have utility to one, but don't encompass "reality" as such.


Some secular people have values, I don't think religious people are saints. Secular people however don't have a framework to 'force' others with supposed values to adhere to them. I don't believe it's narrow minded to believes changes in religion might have an effect on things, the way people follow their religion is influenced by external factors, don't see why it wouldn't be the other way around as well. Atheists are quite new we'll see what happens.

You are deeply mistaken as to roots of this culture difference. There many highly religious cultures which absolutely lack the "social agreement" framework. The real reason why "social agreement" countries exist is feudalism. Feudal structure of power was the second on this planet (after ancient Greece, but that culture had been exterminated) to allow bidirectional agreements between kings and wealthy nobles. The only countries which managed to preserve this tradition unbroken were European ones, NA colonized by the Europeans and Japan which had societal structure close enough to adopt this culture without big changes and who later transferred them to its own colonies in Korea and Taiwan. And that about all countries valuing "social agreement". This is not because of religion or lack of it, it because of the accident - not being conquered by a despotic empire in the middle ages.

> Atheists are quite new

What ever you're smoking, I'd like to try. A break from reality sounds nice right now.


Gallup polling says 1% of people in the US didn't believe in god in 1967, 17% in 2022. Of those 17% i'd imagine many believed at some point (or went to church/temple/...), these people don't really behave like a 'pure' atheist would. They're very much still influenced by the religious ideas they grew up in. So yes it's a rather new thing if you're thinking about society.

I think your problem is you don't seem to be aware of history before 1967 or society outside of the US. Your local community college might offer some courses in history and sociology.

As slow and buggy as Photoshop is, an incredibly fast superhuman AI couldn't get much done in 5 minutes or even 5 hours.

In general I agree with you, just not at the extreme.


Why did you expect info about deploying Astro to Cloudflare Pages? It's been supported for a long time already.

Seems like an obvious thing to call out for people (like me) who don't know.

So something like "now that we own Astro, all of you using Netlify should start migrating to Cloudflare Pages"?

"Hey don't forget it's easy to deploy Astro on Cloudflare pages" with a link to the docs? I saw them mention deployments to Cloudflare (and continuing to support other platforms) but had to go look up what Cloudflare's platform is even called myself. Seems like a missed marketing opportunity.

The same ways Vercel makes it harder to deploy Next.js sites to competitors or for self hosting.

Vercel does not make Next.js hard to deploy elsewhere. Next.js runs fine on serverful platforms like Railway, Render, and Heroku. I have run a production Next.js SaaS on Railway for years with no issues.

What Vercel really did was make Next.js work well in serverless environments, which involves a lot of custom infrastructure [0]. Cloudflare wanted that same behavior on CF Workers, but Vercel never open-sourced how they do it, and that is not really their responsibility.

Next.js is not locked to Vercel. The friction shows up when trying to run it in a serverless model without building the same kind of platform Vercel has.

0. https://www.youtube.com/watch?v=sIVL4JMqRfc


So it is vendor locked by Vercel. That's why there is OpenNext - https://opennext.js.org/

How is it vendor locked?

Recent features are more dependent on Vercel and it's OpenNext that makes it platform independent with adapters.

Can you describe what you mean here? Because I have heard this about 100 times and never understood what people mean when they say this. I am hosting a NextJS site without Vercel and I had no special consideration for it.

Next.js isn't just a static site generator.

Astro isn't just a static site generator either. Not sure what your point is.

[flagged]


Did YOU even bother to look at their site? They support more than static generation, including SSR and even API endpoints. That means Astro has a server that can run server-side (or serverless) to do more than static site generation, so it's not just a static site generator either.

And yes I can see you're posting the same lie all over the comments here.

Stop being a potty mouth.


What do you mean by "not all"? They aren't obligated to block every tool/project trying to use the private API all the way to a lone coder making their own closed-source tool. That's just not feasible. Or did you have a way to do that?

That's already pretty common, but the goal isn't storing less data for its own sake.

> the goal isn't storing less data for its own sake.

Isn't it? I was under impression that the problem is the cost storing all this stuff


Nope, you can't just look at cost of storage and try to minimize it. There are a lot of other things that matter.

What I am asking is, what are the other concerns other than literally the cost? I have interest in this area and I am seeing everyone saying that observability companies are overcharging their consumers.

We're currently discussing the cost of _storage_, and you can bet the providers already are deduplicating it. You just don't get those savings - they get increased margins.

I'm not going to quote the article or other threads here to you about why reducing storage just for the sake of cost isn't the answer.


Well, that's a weirdly confrontational reply. But thanks

So.. He has no backups?

yes.. like most end users?

Claude Code is smart enough to search its session traces and give you the real info.

Naive question, but isn’t every output token generated in roughly the same, non-deterministic, way? Even if it uses its actual history as context, couldn’t the output still be incorrect?

Not trolling, asking as a regular user


Have you ever seen those posts where AI image generation tools completely fail to generate an image of the leaning tower of Pisa straightened out? Every single time, they generate the leaning tower, well… leaning. (With the exception of some more recent advanced models, of course)

From my understanding, this is because modern AI models are basically pattern extrapolation machines. Humans are too, by the way. If every time you eat a particular kind of berry, you crap your guts out, you’re probably going to avoid that berry.

That is to say, LLMs are trained to give you the most likely text (their response) which follows some preceding text (the context). From my experience, if the LLM agent loads a history of commands run into context, and one of those commands is a deletion command, the subsequent text is almost always “there was a deletion.” Which makes sense!

So while yes, it is theoretically possible for things to go sideways and for it to hallucinate in some weird way (which grows increasingly likely if there’s a lot of junk clogging the context window), in this case I get the impression it’s close to impossible to get a faulty response. But close to impossible ≠ impossible, so precautions are still essential.


Yes, but Claude Cowork isn't just an LLM. It's a sophisticated harness wrapped around the LLM (Opus 4.5, for example). The harness does a ton of work to keep the number of tokens sent and received low, as well as the context preserved between calls low. This applies to other coding agents to varying extents as well.

Asking for the trace is likely to involve the LLM just telling the harness to call some tools. Such as calling the Bash tool with grep to find the line numbers in the trace file for the command. It can do this repeatedly until the LLM thinks it found the right block. Then those line numbers are passed to the Read tool (by the harness) to get the command(s), and finally the output of that read is added to the response by the harness.

The LLM doesn't get a chance to reinterpret or hallucinate until it says it is very sorry for what happened. Also, when it originally wrote (hallucinated?) the commands was when it made an oopsy.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: