I tried Cerebras with GLM-4.7 (not Flash) yesterday using paid API credits ($10). They have rate limits per-minute and it counts cached tokens against it so you'll get limited in the first few seconds of every minute, then you have to wait the rest of the minute. So they're "fast" at 1000 tok/sec - but not really for practical usage. You effectively get <50 tok/sec with rate limits and being penalized for cached tokens.
They also charge full price for the same cached tokens on every request/response, so I burned through $4 for 1 relatively simple coding task - would've cost <$0.50 using GPT-5.2-Codex or any other model besides Opus and maybe Sonnet that supports caching. And it would've been much faster.
The pay-per-use API sucks. If you end up on the $50/mo plan, it's better, with caveats:
1 million tokens per minute, 24 million tokens per day. BUT: cached tokens count full, so if you have 100,000 tokens of context you can burn a minute of tokens in a few requests.
Not really worth it, in general. It does reduce latency a little. In practice, you do have a continuing context, though, so you end up using it whether you care or not.
In general, with per minute rate limiting you limit load spikes, and load spikes are what you pay for: they force you to ramp up your capacity, and usually you are then slow to ramp down to avoid paying the ramp up cost too many times. A VM might boot relatively fast, but loading a large model into GPU memory takes time.
I use GLM 4.7 with DeepInfra.com and it's extremely reasonable, though maybe a bit on the slower side. But faster than DeepSeek 3.2 and about the same quality.
It's even cheaper to just use it through z.ai themselves I think.
I know this might not be the most effective use case but I had ended up using the try AI feature in cerebras which opens up a window in browser
Yes, it has some restrictions as well but it still works for free. I have a private repository where I ended up creating a puppeteer instance where I can just input something in a cli and then get output in cli back as well.
With current agents. I don't see how I cannot just expand that with a cheap model like (think minimax2.1 is pretty good for agents) and get the agent to write the files and do the things and a loop.
I think the repository might have gotten deleted after I resetted my old system or similar but I can look out for it if this interests you.
Cerebras is such a good company. I talked to their CEO on discord once and have following it for >1-2 years now. I hope that they don't get enshittified with openAI deal recently & they improve their developer experience because people wish to pay them but now I had to do a shenanigan which was for free (but also its just that I was curious about how puppeteer works so I wanted to find if such idea was possible itself or not & I really didn't use it that much after building it)
Looking at it through a religious lens is pretty narrow-minded. Secular people have values too. You're limiting your ability to understand the world around you.
Some secular people have values, I don't think religious people are saints. Secular people however don't have a framework to 'force' others with supposed values to adhere to them. I don't believe it's narrow minded to believes changes in religion might have an effect on things, the way people follow their religion is influenced by external factors, don't see why it wouldn't be the other way around as well. Atheists are quite new we'll see what happens.
You are deeply mistaken as to roots of this culture difference. There many highly religious cultures which absolutely lack the "social agreement" framework. The real reason why "social agreement" countries exist is feudalism. Feudal structure of power was the second on this planet (after ancient Greece, but that culture had been exterminated) to allow bidirectional agreements between kings and wealthy nobles. The only countries which managed to preserve this tradition unbroken were European ones, NA colonized by the Europeans and Japan which had societal structure close enough to adopt this culture without big changes and who later transferred them to its own colonies in Korea and Taiwan. And that about all countries valuing "social agreement". This is not because of religion or lack of it, it because of the accident - not being conquered by a despotic empire in the middle ages.
Gallup polling says 1% of people in the US didn't believe in god in 1967, 17% in 2022. Of those 17% i'd imagine many believed at some point (or went to church/temple/...), these people don't really behave like a 'pure' atheist would. They're very much still influenced by the religious ideas they grew up in. So yes it's a rather new thing if you're thinking about society.
I think your problem is you don't seem to be aware of history before 1967 or society outside of the US. Your local community college might offer some courses in history and sociology.
"Hey don't forget it's easy to deploy Astro on Cloudflare pages" with a link to the docs? I saw them mention deployments to Cloudflare (and continuing to support other platforms) but had to go look up what Cloudflare's platform is even called myself. Seems like a missed marketing opportunity.
Vercel does not make Next.js hard to deploy elsewhere. Next.js runs fine on serverful platforms like Railway, Render, and Heroku. I have run a production Next.js SaaS on Railway for years with no issues.
What Vercel really did was make Next.js work well in serverless environments, which involves a lot of custom infrastructure [0]. Cloudflare wanted that same behavior on CF Workers, but Vercel never open-sourced how they do it, and that is not really their responsibility.
Next.js is not locked to Vercel. The friction shows up when trying to run it in a serverless model without building the same kind of platform Vercel has.
Can you describe what you mean here? Because I have heard this about 100 times and never understood what people mean when they say this. I am hosting a NextJS site without Vercel and I had no special consideration for it.
Did YOU even bother to look at their site? They support more than static generation, including SSR and even API endpoints. That means Astro has a server that can run server-side (or serverless) to do more than static site generation, so it's not just a static site generator either.
And yes I can see you're posting the same lie all over the comments here.
What do you mean by "not all"? They aren't obligated to block every tool/project trying to use the private API all the way to a lone coder making their own closed-source tool. That's just not feasible. Or did you have a way to do that?
What I am asking is, what are the other concerns other than literally the cost? I have interest in this area and I am seeing everyone saying that observability companies are overcharging their consumers.
We're currently discussing the cost of _storage_, and you can bet the providers already are deduplicating it. You just don't get those savings - they get increased margins.
I'm not going to quote the article or other threads here to you about why reducing storage just for the sake of cost isn't the answer.
Naive question, but isn’t every output token generated in roughly the same, non-deterministic, way? Even if it uses its actual history as context, couldn’t the output still be incorrect?
Have you ever seen those posts where AI image generation tools completely fail to generate an image of the leaning tower of Pisa straightened out? Every single time, they generate the leaning tower, well… leaning. (With the exception of some more recent advanced models, of course)
From my understanding, this is because modern AI models are basically pattern extrapolation machines. Humans are too, by the way. If every time you eat a particular kind of berry, you crap your guts out, you’re probably going to avoid that berry.
That is to say, LLMs are trained to give you the most likely text (their response) which follows some preceding text (the context). From my experience, if the LLM agent loads a history of commands run into context, and one of those commands is a deletion command, the subsequent text is almost always “there was a deletion.” Which makes sense!
So while yes, it is theoretically possible for things to go sideways and for it to hallucinate in some weird way (which grows increasingly likely if there’s a lot of junk clogging the context window), in this case I get the impression it’s close to impossible to get a faulty response. But close to impossible ≠ impossible, so precautions are still essential.
Yes, but Claude Cowork isn't just an LLM. It's a sophisticated harness wrapped around the LLM (Opus 4.5, for example). The harness does a ton of work to keep the number of tokens sent and received low, as well as the context preserved between calls low. This applies to other coding agents to varying extents as well.
Asking for the trace is likely to involve the LLM just telling the harness to call some tools. Such as calling the Bash tool with grep to find the line numbers in the trace file for the command. It can do this repeatedly until the LLM thinks it found the right block. Then those line numbers are passed to the Read tool (by the harness) to get the command(s), and finally the output of that read is added to the response by the harness.
The LLM doesn't get a chance to reinterpret or hallucinate until it says it is very sorry for what happened. Also, when it originally wrote (hallucinated?) the commands was when it made an oopsy.
reply