From the article, it seems like this is exclusively (or mainly?) a problem when the LLM's are hooked up to real-time search. When they talk about what they're trained on, they know that Pravda is unreliable.
So it seems like an easy fix in this particular case, fortunately -- either filter the search results in a separate evaluation pass (quick fix), or do (more) reinforcement training around this specific scenario (long-term fix).
Obviously this is going to be a cat and mouse game. But this looks like it was a simple oversight in this case, not some kind of fundamental flaw in LLM's fortunately.
So it seems like an easy fix in this particular case, fortunately -- either filter the search results in a separate evaluation pass (quick fix), or do (more) reinforcement training around this specific scenario (long-term fix).
Obviously this is going to be a cat and mouse game. But this looks like it was a simple oversight in this case, not some kind of fundamental flaw in LLM's fortunately.