I'm most excited at what this is going to look like not by abandoning RAG but by pairing it with these massive context windows.
If you can parse an entire book to identify relevant chunks using RAG and can fit an entire book into a context window, that means you can fit relevant chunks from an entire reference library into the context window too.
The question I would like to know is whether that just leads you back to hallucinations. ie: is the avoidance of hallucinations intrinsically due to forcing the LLM to consider limited context, rather than directing it to specific / on topic context. Not sure how well this has been established for large context windows?
Having details in context seems to reduce hallucinations, which makes sense if we'd switch to using the more accurate term of confabulations.
LLM confabulations generally occur when they don't have the information to answer, so they make it up, similar to it you've seen split brain studies where one hemisphere is shown something that gets a reaction and the other hemisphere is explaining it with BS.
So yes, RAG is always going to potentially have confabulations if it cuts off the relevant data. But large contexts themselves shouldn't cause it.
> you can fit relevant chunks from an entire reference library into the context window too
I'm curious if a large language model utilizes an extensive context that includes multiple works, whether copyrighted or not, to produce text that significantly differs from the source material, would this constitute infringement? Considering that the model is engaging in a novel process by relating numerous pieces of text, comparing and contrasting their information, and then generating the output of this analysis, could the output be considered usable as training data?
I would set such a model to make a list of concepts, and then generate a wikipedia-like article on each one of them based on source materials obtained with a search engine. The model can tell if the topic is controversial or settled, what is the distribution of human responses, if they are consistent or contradictory, in general report on the controversy, and also report on the common elements that everyone agrees upon.
It would be like writing a report or an analysis. Could help reduce hallucinations and bias, while side stepping copyright infringement because it adds a new purpose and layer of analysis on top of the source materials, and carefully avoids replicating original expression.
I am not sure, it depends on the cost. If they charge per token, a large context will mostly be irrelevant. For some reason, the article did not mention it.
If you can parse an entire book to identify relevant chunks using RAG and can fit an entire book into a context window, that means you can fit relevant chunks from an entire reference library into the context window too.
And that is very promising.