Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I usually ask a simple question that ALL the models get wrong: List of mayor of my city [Londrina]. ALL the models (offine) get wrong. And I mean, all the models. The best that I could, it's o3 I believe, saying it couldn't give a good answer for that, and told to access the city website.

Gemini 3 somehow is able to give a list of mayors, including details on who got impeached, etc.

This should be a simple answer, because all the data is on wikipedia, that certainly the models are trained on, but somehow most models don't manage to give that answer right, because... it's just a irrelevant city in a huge dataset.

But somehow, Gemini 3 did it.

Edit: Just asked "Cool places to visit in Londrina" (In portuguese), and it was also 99% right, unlike other models, who just create stuff. The only thing wrong here, it mentioned sakuras in a lake... Maybe it confused with Brazilian ipês, which are similar, and indeed the city it's full of them.

It seems to have a visual understanding, imo.



Ha, I just did the same with my hometown (Guaiba, RS), a city that is 1/6th of Londrina, and its wikipedia page in English hasn't been updated in years, and still has the wrong mayor (!).

Gemini 3 nailed on the first try, included political affiliation, and added some context on who they competed with and won over in each of the last 3 elections. And I just did a fun application with AI Studio, and it worked on first shot. Pretty impressive.

(disclaimer: Googler, but no affiliation with Gemini team)


Pure fact-based, niche questions like that aren't really the focus of most providers any more from what I've heard, since they can be solved more reliably by integrating search tools (and all providers now have search).

I wouldn't be surprised if the smallest models can answer fewer such (fact-only) questions over time offline as they distill/focus them more thoroughly on logic etc.


Funny, I just asked "Ask Brave", which uses a cheap LLM connected directly to its search engine, and it got it right without any issues.

It shows once again that for common searches, (indexed) data is the king, and that's where I expect that even a simple LLM directly connected to a huge indexed dataset would win against much more sophisticated LLMs that have to use agents for searching.


I asked Claude, and had no issues with the answer including mentioning the impeached Antonio Belinati...


thanks for sharing, very interesting example




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: