The real news is that non-thinking output is now 4x more expensive, which they o...

recursive · 2025-06-18T16:10:42 1750263042

I have LLM fatigue, so I'm not paying attention to headlines... but LLMs are thinking now? That used to be a goal post. "AI can't do {x} because it's not thinking." Now it's part of a pricing chart?

How did I miss this?

svachalek · 2025-06-18T17:09:59 1750266599

"Thinking" means spamming a bunch of stream-of-consciousness bs before it actually generates the final answer. It's kind of like the old trick of prompting to "think step by step". Seeding the context full of relevant questions and concepts improves the quality of the final generation, even though it's rarely a direct conclusion of the so-called thinking before it.

stirfish · 2025-06-18T19:49:14 1750276154

"Thinking" really just means "write on some scratch paper" for llms.

amazingamazing · 2025-06-17T18:49:51 1750186191

Is it possible to get non-thinking only now, though? If not, why would that matter, since it's irrelevant?

jjani · 2025-06-17T18:54:38 1750186478

Yes, by setting the thinking budget to 0. Which is very common when a task doesn't need thinking.

In addition, it's also relevant because for the last 3 months people have built things on top of this.

Workaccount2 · 2025-06-17T22:02:39 1750197759

To be fair, the point of preview models and stable releases is so you know what is stable to build on.

Aeolun · 2025-06-17T23:49:35 1750204175

The moment you start charging for preview stuff I think you give a tacit agreement that you can expect the price to not increase by a factor of 4.

woleium · 2025-06-18T04:37:29 1750221449

that’s a somewhat naïve viewpoint.

Aeolun · 2025-06-18T23:33:25 1750289605

I think the fact that everyone is like ‘wtf’ now kind of reinforces my viewpoint?

Doesn’t mean you can’t do it, but people won’t be happy.

woleium · 2025-06-20T14:21:58 1750429318

Who cares about happy (in the short term), as long as they continue to pay.

jjani · 2025-06-18T06:33:32 1750228412

Gmail was in beta for what, 2 decades? Did you never use it during that time? They've been using these "Preview" models on their non-technical user facing Gemini app and product for months now. Like, Google themselves has been using them in production, on their main apps. And gemini-1.5-pro is 2 months from depreciation and there was no production alternative.

They told everyone to build their stuff on top of it, and then jacked up the price by 4x. Just pointing to some fine print doesn't change that.

vardump · 2025-06-18T15:32:00 1750260720

I'd be more worried about Google just discontinuing another product. For example Stadia was similarly high profile, but it's gone now.

More examples here: https://killedbygoogle.com/

amazingamazing · 2025-06-17T18:57:04 1750186624

interesting - why wouldn't you use dynamic thinking? and yeah, sucks when the price changes.

dcre · 2025-06-17T21:37:57 1750196277

It makes responses much slower with zero benefit for many tasks. Flash with thinking off is very fast.

drag0s · 2025-06-17T20:26:18 1750191978

one example where non-thinking matters would be latency-sensitive workflows, for example voice AI.

jjani · 2025-06-17T23:22:58 1750202578

Correct, though pretty much anything end-user facing is latency-sensitive, voice is a tiny percentage. No one likes waiting, the involvement of an LLM doesn't change this from a user PoV.

eru · 2025-06-18T01:21:57 1750209717

I wonder if you can hide the latency, especially for voice?

What I have in mind is to start the voice response with a non-thinking model, say a sentence or two in a fraction of a second. That will take the voice model a few seconds to read out. In that time, you use a thinking model to start working on the next part of the response?

In a sense, very similar to how everyone knows to stall in an interview by starting with 'this is a very good question...', and using that time to think some more.

drift_code · 2025-06-17T18:50:21 1750186221

They seem just rebrand the non-thinking model to flash-lite, so it’s less expensive than before

jjani · 2025-06-17T18:53:06 1750186386

Not at all. Non-thinking flash is... flash with the thinking budget set to 0 (which you can still run that way, just at 2x input 4x output pricing). Flash-lite is far weaker, unusable for the overwhelming majority of usecases of flash. A quick glance at the benchmark reveals this.

rvnx · 2025-06-17T19:03:45 1750187025

Yeah, so basically their announcement is "good news, we tripled the price, and will deprecate Gemini Flash 2.0 asap"

mcintyre1994 · 2025-06-17T18:54:58 1750186498

The OP says Flash-Lite has thinking and non-thinking, so it’s not that simple.