Hacker News new | past | comments | ask | show | jobs | submit login

You can also see this difference in open router.

But why is there only thinking flash now?






It might be a bit confusing, but there's no "only thinking flash" - it's a single model, and you can turn off thinking if you set thinking budget to 0 in the API request. Previously 2.5 Flash Preview was much cheaper with the thinking budget set to 0, now the price is the same. Of course, with thinking enabled the model will still use far more output tokens than the non-thinking mode.

Interesting design choice, and makes me think of "Thinking, Fast and Slow" by Kahneman.

(I thought of it quickly, not slowly, so the comparison may only be surface deep.)


Apparently, you can make a request to 2.5 flash to not use thinking, but it will still sometimes do it anyways, this has been an issue for months, and hasn't been fixed by model updates: https://github.com/google-gemini/cookbook/issues/722



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: