A recent paper shows that empowering a language model to search a text corpus (or the internet) for additional information could improve model efficiency by 25 times [1]. So you only need a small model because you can consult the text to get trivia.
That's 25x in one go. Maybe we have the chance to run GPT-4 ourselves and not need 20 GPU cards and a 1mil $ computer.