Adaptive RAG – dynamic retrieval methods adjustment

whakim · on April 1, 2024

From a consumer perspective, this is a super interesting paper because it touches on one of the fundamental issues with most RAG beyond the toy case - that you need to do different stuff depending on what the user is asking for. You also (usually) can't just ask because most users don't know that LLMs are bad at math or semantic search won't be sufficient to answer questions that involve enumeration or totality. And while you can always add more steps to your RAG pipeline, some of those steps may be computationally expensive or not particularly relevant to the question at hand.

That being said, it is a bit frustrating that so much RAG research focuses on multi-hop approaches with LLMs. IME multiple round trips to an LLM is essentially a non-starter for any serious consumer product as it's far too slow. Smaller models can struggle to follow instructions so they often can't be an adequate replacement even for simpler tasks. Curious to hear if other folks working in this space have had any success thinking critically about these types of problems!

CuriouslyC · on April 1, 2024

That depends on the model, you can run stuff in parallel and sometimes keep everything timely. You shouldn't be waiting till the last second to start running rag, you can be pre-emptively building context based on the current chat (like a human does) so that you've already got stuff summarized and ready to fire off when the final prompt does come.

Think about how a human will draw out a conversation around answering a question and use delaying words and phrases to let them continue answering when they don't have the solution fully formulated. LLMs can use the same tactic.

whakim · on April 1, 2024

Could you elaborate a bit more about how that would work in practice?

CuriouslyC · on April 1, 2024

Sure, if you're running a customer service chatbot, you can ask customers what the problem is, then start running rag async to populate a proper context for a smart LLM, and have the chatbot continue asking some questions to clarify details to give the background RAG process time to fetch data and run a quick summary, then have the chatbot give some indication it's thinking, run the full context query on the smart LLM, generate a summary answer then feed it back to the chat LLM and say "I may have found a solution to your problem" then switch to the response from the smart LLM.

whakim · on April 1, 2024

I see what you're saying, but you're assuming that consumer products are always chatbots (and that a small language model can buy time interacting with the user while possibly providing additional context). That being said, I would be interested to see such a system in practice - any examples you can point me to? My more general point was not chat-related; much of the research around RAG seems to use LLMs to parse or route the user's query, improve retrieval, etc. which doesn't often work in practice.

CuriouslyC · on April 1, 2024

This is where the opportunity for creativity comes in. You could allow a chat based refinement to search queries, or provide popup refinement buttons that narrow the search space, and build the search results iteratively rather than the old paradigm of "search" -> "results"

humansareok1 · on April 1, 2024

Is there any real point to further RAG work given extremely large contexts are clearly on the way with 1M token contexts already proven?

eropple · on April 1, 2024

I think so. Mostly:

- Hosted solutions charge you for tokens. More tokens, more money. Keeping money in your pocket: generally recognized as rad.

- 1M tokens wouldn't hold the entire codebase I have open in my other window.

whakim · on April 1, 2024

I'm still not sold on recall at such large context window sizes. It's easy for an LLM to find a needle in a haystack, but in most RAG use-cases it's like finding a needle in a stack of needles, and the benchmarks don't really reflect that. There's also the speed and cost implications of dumping millions of tokens into a prompt - it's prohibitively slow and expensive right now.

Narciss · on April 1, 2024

It's still much cheaper to run RAG in production (at least if you are using closed models). I'd love to use the entire context of GPT4, but if I do that in production it'll cost much more than using some RAG-dependent implementation.

humansareok1 · on April 1, 2024

But this is just current state. Token costs continue to go down and contexts will continue to get larger.

neverokay · on April 1, 2024

The LLM is always out of date. RAG is here to stay.

chasd00 · on April 1, 2024

correct me if i'm wrong but you still have to get the context populated. It's still the RAG pattern you just put more data in the prompt than before.

humansareok1 · on April 1, 2024

Well I guess the assumption is that you can just fit everything into the context and there'd be nothing to retrieve anymore.

istinetz · on April 1, 2024

yes - private data, real-time data, curated data, citations with no hallucinations, RAG on tabular data, RAG on video, RAG on hierarchical mixed data, RAG over a graph

cosmojg · on April 1, 2024

Memory-constrained environments.

humansareok1 · on April 1, 2024

If you're running a 1M context LLM I assume you are well past memory constraints since the weights alone are 10s of GBs.

jonnycoder · on April 1, 2024

This seems similar to building a RAG router (1) to perform dynamic retrieval/querying over data.

After getting hundreds of questions on my Interactive Resume AI chatbot (2), I've found the user queries can be categorized as: greeting, professional skills question, professional experience question, personal/hobby question and common interview question.

I am currently working on building a RAG router to help improve the quality of Q&A responses. I currently use gpt3.5 turbo without any special RAG techniques and the quality is lacking on performing Q&A over my resume and Q&A csv file. GPT4 works well but is too expensive.

1. https://docs.llamaindex.ai/en/stable/examples/low_level/rout... 2. https://jon-olson.com/resume_ai

machinelearning · on April 1, 2024

This is a simple version of the tree search approach that people suspect Q* is

jillesvangurp · on April 1, 2024

Teaching LLMs how to search is probably going to be key to make them hallucinate far less. Most RAG approaches currently use simple vector searches to pull out information. Chat GPT actually is able to run Bing searches. And presumably Gemini uses Google's search. It's fairly clunky and unsophisticated currently.

These searches are still relatively dumb. With LLMs not being half bad at remembering a lot of things, programming simple solutions to problems, etc. a next step could be to make them come up with a query plan to retrieve the information they need to answer a question that is more sophisticated than just calculating a vector for the input, fetching n results and adding those to the context, and calling it a day.

Our ability to Google solutions to problems is inferior to that of an LLM able to generate far more sophisticated, comprehensive, and exhaustive queries against a wide range of databases and sources and filter through the massive amount of information that comes back. We could do it manually but it would take ages. We don't actually need LLMs to know everything there is to know. We just need them be able to know where to look and evaluate what they find in context. Sticking to what they find rather than what they know means their answers are as good as their ability to extract, filter and rank information that is factual and reputable. That means hallucination becomes less of a problem because it can all be tracked back to what they found. We can train them to ask better questions rather than hallucinate better answers.

Having done a lot of traditional search related stuff in the past 20 years, I got really excited about RAG when I first read about it because I realized two things: most people don't actually know a lot but they can learn how to find out (e.g. Googling stuff). And, learning how to find stuff isn't actually that hard.

Most people that use Google don't have a clue how it works. LLMs are actually well equipped to come up with solid plans for finding stuff. They can program, they know about different sources of information and how to access them. They can actually pick apart documentation written for humans and use that to write programs, etc. In other words, giving LLMs better search, which is something I know a bit about, is going to enable them to give better, more balanced answers. We've seen nothing yet.

What I like about this is that it doesn't require a lot of mystical stuff by people who arguably barely understand the emergent properties of LLMs even today. It just requires more system thinking. Smaller LLMs trained to search rather than to know might be better than a bloated know-it-all blob of neurons with the collective knowledge of the world compressed into it. The combination might be really good of course. It would be able to hallucinate theories and then conduct the research needed to validate them.

CuriouslyC · on April 1, 2024

One big problem is that we've build search for humans, more specifically to advertise to them.

AI doesn't need a human search, it needs a "fact database" that can pull short factoids with a truth value, which could be a distribution based on human input. So for example, you might have the factoid "Donald Trump incited insurrection on January 6th" with a score of 0.8 (out of 1) with a 0.3 variance since people either tend to absolutely believe it or disbelieve it, with more people on the believing side.

Beyond that AI needs a "logical tools" database with short examples of their use that it can pull from for any given problem.

dberg · on March 31, 2024

anyone know the proper github link, one in paper 404s..

simcop2387 · on April 1, 2024

Given that the github account itself is valid, and that it has some other repositories related to ML, I suspect the link will be working "soon". It's likely a private repo while the paper is going through all the places the author needs it to before they can fully publish things. I've seen this a lot with pre-print papers in this space where the paper goes out first before they publish the code or other resources.

oliveralbertini · on April 1, 2024

the repository links is dead

boodleboodle · on April 1, 2024

Are we advertising papers on hackernews now?

milliondreams · on April 1, 2024

I do find myself reading papers often for my work, and I share the once I find interesting or feel might have impact in future of my chosen domain. This is no advertisement, I don't know the authors or anyone related to the paper.

seanc · on April 1, 2024

My father was a PhD psychologist and family therapist. He was on the witness stand during a custody case explaining a theory of personality when the cross-examining lawyer said scornfully "I'll bet you got that out of some book." To which my dad replied: "Why yes, in fact. In my profession, in order to learn things, we often read books."

dmarchand90 · on April 1, 2024

Please continue doing so! I don't work in AI directly and Research highlights from community posts such as yours is how I keep up with the field.

squigz · on April 1, 2024

Papers are shared all the time? I'm confused by this comment

nextworddev · on April 1, 2024

There’s recently been a lot of “arxiv”-washing of commercial white papers recently

kadushka · on April 1, 2024

Is Korea Advanced Institute of Science and Technology a commercial entity?

donpark · on April 1, 2024

KAIST is a top-tier South Korea university focused on science & engineering.

persolb · on April 1, 2024

Can someone explain what this means? Reproducing some private work publicly?

gopher_space · on April 1, 2024

I think they mean using arxiv to host whitepapers so they smell more academic.

neodypsis · on April 1, 2024

What criteria one uses to distinguish a white paper from an academic paper?

vinni2 · on April 1, 2024

Academic papers are peer reviewed white papers aren’t.

vinni2 · on April 1, 2024

This paper is just accepted in NAACL which is a top NLP conference.

allenleein · on April 1, 2024

The true hackers study research.

yard2010 · on April 1, 2024

Do you not?