I'm still not sold on recall at such large context window sizes. It's easy for a...

I'm still not sold on recall at such large context window sizes. It's easy for an LLM to find a needle in a haystack, but in most RAG use-cases it's like finding a needle in a stack of needles, and the benchmarks don't really reflect that. There's also the speed and cost implications of dumping millions of tokens into a prompt - it's prohibitively slow and expensive right now.