What is SOTA for retrieval in RAG systems now?

Have there been significant improvements this year?

The simple flow we landed on in 2024 was:

1. Chunk and embed docs with embedding model 2. Embed query (maybe using an LLM to reformulate first) 3. Retrieve N1 docs using cosine similarity 4. Narrow to N2 using a reranking model 5. Inject these docs into context to generate answer

Have there been significant advancements? Has anyone had seen improvements using graph structures like Neo4j for more sophisticated retrieval?