advael's answer was fine, but since people seem to be hung up on the wording, a ...

advael's answer was fine, but since people seem to be hung up on the wording, a more direct response:

We have human institutions dedicated at least nominally to finding and publishing truth (I hate having to qualify this, but Hacker News is so cynical and post-modernist at this point that I don't know what else to do). These include, for instance, court systems. These include a notion of evidentiary standards. Eyewitnesses are treated as more reliable than hearsay. Written or taped recordings are more reliable than both. Multiple witnesses who agree are more reliable than one. Another example is science. Science utilizes peer review, along with its own notion of hierarchy of evidence, similar to but separate from the court's. Interventional trials are better evidence than observational studies. Randomization and statistical testing is used to try and tease out effects from noise. Results that replicate are more reliable than a single study. Journalism is yet another example. This is probably the arena in which Hacker News is most cynical and will declare all of it is useless trash, but nonetheless reputable news organizations do have methods they use to try and be correct more often than they are not. They employ their own fact checkers. They seek out multiple expert sources. They send journalists directly to a scene to bear witness themselves to events as they unfold.

You're free to think this isn't sufficient, but this is how we deal with humans making up stuff and it's gotten us modern civilization at least, full of warts but also full of wonders, seemingly because we're actually right about a lot of stuff.

At some point, something analogous will presumably be the answer for how LLMs deal with this, too. The training will have to be changed to make the system aware of quality of evidence. Place greater trust in direct sensor output versus reading something online. Place greater trust in what you read from a reputable academic journal versus a Tweet. Etc. As it stands now, unlike human learners, the objective function of an LLM is just to produce a string in which each piece is in some reasonably high-density region of the probability distribution of possible next pieces as observed from historical recorded text. Luckily, producing strings in this way happens to generate a whole lot of true statements, but it does not have truth as an explicit goal and, until it does, we shouldn't forget that. Treat it with the treatment it deserves, as if some human savant with perfect recall had never left a dark room to experience the outside world, but had read everything ever written, unfortunately without any understanding of the difference between reading a textbook and reading 4chan.