Weird thing is, in Google AI Studio all their models—from the state-of-the-art G...

aurareturn · 2025-04-07T05:22:02 1744003322

Makes sense that search has a small, fast, dumb model designed to summarize and not to solve problems. Nearly 14 billion Google searches per day. Way too much compute needed to use a bigger model.

fire_lake · 2025-04-07T06:17:57 1744006677

Massive search overlap though - and some questions (like the golf ball puzzle) can be cached for a long time.

summerlight · 2025-04-07T08:21:41 1744014101

AFAIK they got 15% of unseen queries everyday, so it might be not very simple to design an effective cache layer on that. Semantic-aware clustering of natural language queries and projecting them into a cache-able low rank dimension is a non-trivial problem. Of course, LLM can effectively solve that, but then what's the point of using cache when you need LLM for clustering queries...

fire_lake · 2025-04-08T09:11:27 1744103487

Not a search engineer, but wouldn’t a cache lookup to a previous LLM result be faster than a conventional free text search over the indexed websites? Seems like this could save money whilst delivering better results?

summerlight · 2025-04-08T18:48:50 1744138130

Yes, that's what Google's doing for AI overview IIUC. From what I've seen from my experiences, this is working okay and improving over time but not close to perfection. The results are stale for developing stories, some bad results are kept there for a long time, effectively same queries are returning different caches etc etc...

vintermann · 2025-04-07T08:31:43 1744014703

I have a strong suspicion that for all the low threshold APIs/services, before the real model sees my prompt, it gets evaluated by a quick model to see if it's something they care to bother the big models with. If not i get something shaked out of the sleeve of a bottom barrel model.

Workaccount2 · 2025-04-07T14:07:15 1744034835

Google is shooting themselves in the foot with whatever model they use for search. It's probably a 2B or 4B model to keep up with demand, and man is it doing way more harm than good.

InDubioProRubio · 2025-04-07T10:32:14 1744021934

Its most likely one giant ["input token close enough question hash"] = answer_with_params_replay? It doesent missunderstands the question, it tries to squeeze the input to something close enough?