I see what you're saying, but you're assuming that consumer products are always chatbots (and that a small language model can buy time interacting with the user while possibly providing additional context). That being said, I would be interested to see such a system in practice - any examples you can point me to? My more general point was not chat-related; much of the research around RAG seems to use LLMs to parse or route the user's query, improve retrieval, etc. which doesn't often work in practice.
This is where the opportunity for creativity comes in. You could allow a chat based refinement to search queries, or provide popup refinement buttons that narrow the search space, and build the search results iteratively rather than the old paradigm of "search" -> "results"