Have there been significant improvements this year?
The simple flow we landed on in 2024 was:
1. Chunk and embed docs with embedding model
2. Embed query (maybe using an LLM to reformulate first)
3. Retrieve N1 docs using cosine similarity
4. Narrow to N2 using a reranking model
5. Inject these docs into context to generate answer
Have there been significant advancements? Has anyone had seen improvements using graph structures like Neo4j for more sophisticated retrieval?