Saying "I don't know" doesn't require too much of a change. This isn't a differe...

Saying "I don't know" doesn't require too much of a change. This isn't a different mode of operation where it's introspecting about its own knowledge - it's just the best continuation prediction in a context where the person/entity being questioned is not equipped to answer.

LLMs create quite deep representations of the input on which they based their next word prediction (text continuation), and it has been proved that they already sometimes do know when something they are generating is low confidence or false, so maybe with appropriate training data they could better attend to this and predict "I don't know" or "I'm not sure".

To improve the ability of LLMs to answer like this requires them to have a better idea of what is true or not. Humans do this by remembering where they learnt something: was it first hand experience, or from a text book or trusted friend, or from a less trustworthy source. LLMs ability to discern the truth could be boosted by giving them the sources of their training data, maybe together with a trustworthiness rating (although they may be able to learn that for themselves).