LLMs are, at their core, search tools. Training is indexing and prompting is querying that index. The granularity being at the n-gram rather than the document level is a huge deal though.
Properly using them requires understanding that. And just like we understand every query won’t find what we want, neither will every prompt. Iterative refinement is virtually required for nontrivial cases. Automating that process, like eg cursor agent, is very promising.
Fundamentally, no they're not. That is why you have cases like the Air Canada chatbot that told a user about a refund opportunity that didn't exist, or the lawyer in Mata v Avianca who cited a case that didn't exist. If you ask an LLM to search for something that doesn't exist, there's a decent chance it will hallucinate something into existence for you.
What LLMs are good at is effectively turning fuzzy search terms into non-fuzzy terms; they're also pretty good at taking some text and recasting into an extremely formulaic paradigm. In other words, turning unstructured text into something structured. The problem they have is that they don't have enough understanding of the world to do something useful that with structured representation that needs to be accurate.
This is the wrong take. Search tools are deterministic unless you purposely inject random weights into the ranking. With search tools, the same search query will always yield the same search result, provided they are designed too and/or the underlying data has not changed.
With LLMs, I can ask the exact same question and get a different response, even if the data has not changed.
The randomness comes from sampling. With local LLMs, you can fix the random seed, or even disable sampling all together - both will get you determinism.
I agree that LLMs are not search tools, but for very different reasons.
Thanks for the info on local LLMs. Based on my chats with multiple LLMs, the biggest issue appears to be hardware.
Non-deterministic hardware: All LLMs mentioned that modern computing hardware, such as GPUs or TPUs, can introduce non-determinism due to factors like parallel processing, caching, or numerical instability. This can make it challenging to achieve determinism, even with fixed random seeds or deterministic algorithms.
Semantics. It may be able to get deterministic but it’s unstable wrt unrelated changes in the training data, no? If I add a page about sausages to a search index, the results for ”ski jacket” will be unaffected. In a practical sense, LLMs are non-deterministic. I mean, ChatGPT even has a ”regenerate” button to expose this ”turbulence” as a feature.
LLMs are, at their core, fucking Dissociated Press. That's what makes them fun and interesting, and that's the problem with using them for real production work.
Properly using them requires understanding that. And just like we understand every query won’t find what we want, neither will every prompt. Iterative refinement is virtually required for nontrivial cases. Automating that process, like eg cursor agent, is very promising.