Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A LLM is a generator of misinformation

This is a strange statement. No one is training LLMs to generate “misinformation”. It’s the opposite - it’s trained to generate the most likely next word, given the preceding 2000 words - using billions of examples from a real world training corpus. So it will try to generate as much information as what’s present in the corpus. Maybe even more, but that’s debatable.




>No one is training LLMs to generate “misinformation”.

That is phrased like it is stating a fact about the training process, but it is a statement about the intent of the training, isn't it? So I don't see it as rebutting my comment.

>It’s the opposite - it’s trained to generate the most likely next word

Sure, of course, what else? But if you take any correct statement about something and modify it slightly, it's not very likely it will still be correct.

It seems intuitive to me that there are going to be a million billion (understatement) wrong things next to anything correct in the inputs. As a sort of combinatorial, mathematical thing. You just (in principle) count all the ways to be wrong that are similar to being right.

Nobody trained it to get anything right! It doesn't matter what people expect if they don't have a procedure to do it.

If a statement is adjacent to things that are also "correct", that almost implies a lack of information in the original statement. It seems born out in the impressive BS'ing - the key to BS'ing is saying things that can't really be wrong.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: