You can also see this difference in open router. But why is there only thinking ...

Tiberium · 2025-06-17T17:37:12 1750181832

It might be a bit confusing, but there's no "only thinking flash" - it's a single model, and you can turn off thinking if you set thinking budget to 0 in the API request. Previously 2.5 Flash Preview was much cheaper with the thinking budget set to 0, now the price is the same. Of course, with thinking enabled the model will still use far more output tokens than the non-thinking mode.

davedx · 2025-06-18T11:08:41 1750244921

Interesting design choice, and makes me think of "Thinking, Fast and Slow" by Kahneman.

(I thought of it quickly, not slowly, so the comparison may only be surface deep.)

hnuser123456 · 2025-06-17T17:36:06 1750181766

Apparently, you can make a request to 2.5 flash to not use thinking, but it will still sometimes do it anyways, this has been an issue for months, and hasn't been fixed by model updates: https://github.com/google-gemini/cookbook/issues/722