Before sharing how it works, I want to highlight some of the challenges of a system like this. Unlike deep research over the internet, LLMs aren’t able to easily leverage the built in searches of these SaaS applications. They each have different ways of searching for things, many do not have strong search capabilities, or they rely on their internal query language. There are also a ton of other signals that web search engines use that aren’t available natively in the tools. Examples include: backlinks, clickthrough rates, etc. Additionally a lot of teams rely on internal terminology that is unique to them and hard for the LLM to search for. There’s also the challenge of unifying the objects across all of the apps into a plaintext representation that works for the LLM.
The best way we’ve found to do this is to build a document index instead of relying on application native searches at query time. The document index is a hybrid index of keyword frequencies and vectors. The keyword component addresses issues like team specific terminology and the vector component allows for natural language queries and non-exact matching. Since all of the documents across the sources are processed prior to query time, inference is fast and all of the documents have already been mapped to an LLM friendly representation.
There are also other signals that we can take into account which are applied across all of the sources. For example, the time that a document was last updated is used to prioritize more recent documents. We also have models that run at indexing time to label documents and models that run at inference time to dynamically change the weights of the search function parameters.
When you talked about "document index is a hybrid index of keyword frequencies and vectors", I am a bit curious of how to get them. In pre-processing, do you have to use LLM / models go through documents to get keywords? What about vectors? Are you using embed model to generate them? Does that imply preprocess has to be done whenever there is new doc or any modification in existing doc? Would that be costly in time? Any spicy cook to make the preprocessing more efficient?
The best way we’ve found to do this is to build a document index instead of relying on application native searches at query time. The document index is a hybrid index of keyword frequencies and vectors. The keyword component addresses issues like team specific terminology and the vector component allows for natural language queries and non-exact matching. Since all of the documents across the sources are processed prior to query time, inference is fast and all of the documents have already been mapped to an LLM friendly representation.
There are also other signals that we can take into account which are applied across all of the sources. For example, the time that a document was last updated is used to prioritize more recent documents. We also have models that run at indexing time to label documents and models that run at inference time to dynamically change the weights of the search function parameters.