Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The real news is that non-thinking output is now 4x more expensive, which they of course carefully avoid mentioning in the blog, only comparing the thinking prices.

How cute they are with their phrasing:

> $2.50 / 1M output tokens (*down from $3.50 output)

Which should be "up from $0.60 (non-thinking)/down from $3.50 (thinking)"




I have LLM fatigue, so I'm not paying attention to headlines... but LLMs are thinking now? That used to be a goal post. "AI can't do {x} because it's not thinking." Now it's part of a pricing chart?

How did I miss this?


"Thinking" means spamming a bunch of stream-of-consciousness bs before it actually generates the final answer. It's kind of like the old trick of prompting to "think step by step". Seeding the context full of relevant questions and concepts improves the quality of the final generation, even though it's rarely a direct conclusion of the so-called thinking before it.


"Thinking" really just means "write on some scratch paper" for llms.


Is it possible to get non-thinking only now, though? If not, why would that matter, since it's irrelevant?


Yes, by setting the thinking budget to 0. Which is very common when a task doesn't need thinking.

In addition, it's also relevant because for the last 3 months people have built things on top of this.


To be fair, the point of preview models and stable releases is so you know what is stable to build on.


The moment you start charging for preview stuff I think you give a tacit agreement that you can expect the price to not increase by a factor of 4.


that’s a somewhat naïve viewpoint.


I think the fact that everyone is like ‘wtf’ now kind of reinforces my viewpoint?

Doesn’t mean you can’t do it, but people won’t be happy.


Who cares about happy (in the short term), as long as they continue to pay.


Gmail was in beta for what, 2 decades? Did you never use it during that time? They've been using these "Preview" models on their non-technical user facing Gemini app and product for months now. Like, Google themselves has been using them in production, on their main apps. And gemini-1.5-pro is 2 months from depreciation and there was no production alternative.

They told everyone to build their stuff on top of it, and then jacked up the price by 4x. Just pointing to some fine print doesn't change that.


I'd be more worried about Google just discontinuing another product. For example Stadia was similarly high profile, but it's gone now.

More examples here: https://killedbygoogle.com/


interesting - why wouldn't you use dynamic thinking? and yeah, sucks when the price changes.


It makes responses much slower with zero benefit for many tasks. Flash with thinking off is very fast.


one example where non-thinking matters would be latency-sensitive workflows, for example voice AI.


Correct, though pretty much anything end-user facing is latency-sensitive, voice is a tiny percentage. No one likes waiting, the involvement of an LLM doesn't change this from a user PoV.


I wonder if you can hide the latency, especially for voice?

What I have in mind is to start the voice response with a non-thinking model, say a sentence or two in a fraction of a second. That will take the voice model a few seconds to read out. In that time, you use a thinking model to start working on the next part of the response?

In a sense, very similar to how everyone knows to stall in an interview by starting with 'this is a very good question...', and using that time to think some more.


They seem just rebrand the non-thinking model to flash-lite, so it’s less expensive than before


Not at all. Non-thinking flash is... flash with the thinking budget set to 0 (which you can still run that way, just at 2x input 4x output pricing). Flash-lite is far weaker, unusable for the overwhelming majority of usecases of flash. A quick glance at the benchmark reveals this.


Yeah, so basically their announcement is "good news, we tripled the price, and will deprecate Gemini Flash 2.0 asap"


The OP says Flash-Lite has thinking and non-thinking, so it’s not that simple.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: