People in regions where Internet access is a problem to begin with do not download Wikipedia because they don't have the ability to do so. That's why Kiwix has side projects such as a Raspberry Pi based content server that can be set up in e.g. a school and used to serve content to all students using whatever cheap phones they might have, or a reader that runs on WinXP. And, yes, that stuff is actually deployed in various places in Africa.
Now imagine the hardware that you'll need to run that high-quality future LLM. How many Kiwix hotspots could you set up for the same money?
Mind you, I'm not saying that there aren't compelling use cases for LLMs. It's just that it's a thing that's nice to have after you have Wikipedia etc. Indeed, ideally you'd want to have Wikipedia indexed so that the LLM can query it as needed to construct its responses - otherwise you're going to have a lot of fun trying to figure out which parts are hallucinations and which aren't.
The situation I’m considering is not people who are sitting in regions where Internet connectivity is fundamentally poor, because there is no infrastructure. The situation I’m considering is one where Internet connectivity exists but is being deliberately throttled or filtered: either through periodic Internet shutdowns (as is happening in Pakistan right now as described in TFA) or through ubiquitous content and site filtering (as happens routinely in China and increasingly in other nations.)
There are places where Internet access and compute resources are fundamentally limited by lack of infrastructure. Talking about LLMs in those places doesn’t make sense. But on the trajectory we’re currently on, those places will likely become more rare while political Internet filtering will become more common.
Now imagine the hardware that you'll need to run that high-quality future LLM. How many Kiwix hotspots could you set up for the same money?
Mind you, I'm not saying that there aren't compelling use cases for LLMs. It's just that it's a thing that's nice to have after you have Wikipedia etc. Indeed, ideally you'd want to have Wikipedia indexed so that the LLM can query it as needed to construct its responses - otherwise you're going to have a lot of fun trying to figure out which parts are hallucinations and which aren't.