One approach to fixing factual errors is to use two rounds of LLM interaction. I...

ptx · on Jan 27, 2023

How would you apply this approach to the article's example, the response to the question "Who is Daragh O Brien from Castlebridge", where I count at least 15 separate statements of fact?

Should we research all of them and try again with a big table of hits and misses from the first attempt? Seems like a lot of work.

Also: Is the generated response really "very good at matching the correct answer"? I suppose it would work because the search engine's language processing cancels out the useless parts that were generated by the AI (sort of a "human ABI", analogous to the C ABI?) but a more direct query (e.g. "height of everest") would likely be just as effective.

lsy · on Jan 27, 2023

Yes, this is the essential issue. Any system competent to fully ground and check factual statements in a stream of arbitrary text will be phenomenally more complex than the original LLM, and usually will be able to answer queries directly, at which point one wonders what the LLM is adding. I think at best, if we can somehow identify all factual statements that need to be cross-referenced and then offload them to a knowledge base (dubious), we are left this kind of mad-libs connective flow that the LLM has created which approximates the essay style of a human writer. I'm not certain that has much practical value besides allowing for a form of undetectable plagiarism to be published as though it were free-form writing.