Current vector-based RAG systems rely on semantic similarity to retrieve content — but similarity ≠ relevance.
In domains like finance or law, the answer isn't just in the paragraphs that look similar to the query — it's in the sections that human experts would look at first. Embedding models don't know that, and fine-tuning them to encode this domain logic is expensive and inflexible.
We built PageIndex to solve this.
It turns long documents into a tree-based index — like a searchable, LLM-friendly table of contents. Instead of splitting documents into flat chunks, it gives LLMs a way to reason and retrieve through the content like a human — navigating by structure and guided by expert rules.
Example: If someone asks "why did revenue go down last year?", experienced analysts would go directly to the "Management's Discussion and Analysis" section in a company's annual financial report, where changes in performance are explained.
These rules can be injected as prompts into LLM to guide PageIndex traversal. No model retraining needed.
It's reasoning-based RAG — not similarity search, but navigation guided by structured reasoning and domain logic.
Would love feedback, especially thoughts on reasoning-based RAG or other potential applications of PageIndex.
Would love feedback and suggestions.