Could you elaborate a bit more about how that would work in practice?

CuriouslyC · on April 1, 2024

Sure, if you're running a customer service chatbot, you can ask customers what the problem is, then start running rag async to populate a proper context for a smart LLM, and have the chatbot continue asking some questions to clarify details to give the background RAG process time to fetch data and run a quick summary, then have the chatbot give some indication it's thinking, run the full context query on the smart LLM, generate a summary answer then feed it back to the chat LLM and say "I may have found a solution to your problem" then switch to the response from the smart LLM.

whakim · on April 1, 2024

I see what you're saying, but you're assuming that consumer products are always chatbots (and that a small language model can buy time interacting with the user while possibly providing additional context). That being said, I would be interested to see such a system in practice - any examples you can point me to? My more general point was not chat-related; much of the research around RAG seems to use LLMs to parse or route the user's query, improve retrieval, etc. which doesn't often work in practice.

CuriouslyC · on April 1, 2024

This is where the opportunity for creativity comes in. You could allow a chat based refinement to search queries, or provide popup refinement buttons that narrow the search space, and build the search results iteratively rather than the old paradigm of "search" -> "results"