Why is RAG a horrible hack? LLMs can draw from only 2 sources of data: their par...

nl · on Feb 8, 2024

RAG is a hack for lots of reasons, but the reason I'm focused on at the moment is the pipeline.

Say you are trying to do RAG in a chat-type application. You do the following:

1) Summarize the context of chat into some text that is suitable for a search (lossy).

2) Turn this into a vector embedded in a particular vector space.

3) Use this vector to query a vector database, which returns reference to documents or document fragments (which themselves have been indexed as a lossy vector).

4) Take the text of these fragments and put them in the context of the LLM as input.

5) Modify the prompt to explain what these fragments are.

6) Then the prompt is sent to the LLM, which turns it into it's own vector representation.

An obvious improvement to this is that the VectorDB and the LLM should share an internal representation, and the VectorDB should understand this. The LLM should take this vector input as a second input alongside the text context and the LLM should combine them (in the same way you can put a text and image into a multi-modal model)

mrfox321 · on Feb 8, 2024

I guess op may be envisioning an end-to-end solution that can train a model in the context of an external document store.

I.e. One day we want to be able to backprop through the database.

Search systems face equivalent problems. The hierarchy of ML retrieval systems are separately optimized (trained). Maybe this helps regularize things, but, given enough compute / complexity, it is theoretically possible to differentiate through more of the stack.