Depends on your definition of knowing. Sure, we know it is predicting next tokens, but do we understand why they output the things they do? I am not well versed with LLMs, but I assume even for smaller modles interpretability is a big challenge.
The answer is simple: the set of weights and biases comprise a mathematical function which has been specifically built to approximate the training set. The methods of building such a function are very old and well-known (from calculus).
There's no magic here. Most of people's awestruck reactions are due to our brain's own pattern recognition abilities and our association of language use with intelligence. But there's really no intelligence here at all, just like the "face on Mars" is just a random feature of a desert planet's landscape, not an intelligent life form.