Using the full context window each time is a great way to ensure your app is slow, expensive, and in-accurate. RAG is crucial, anyone thats built an AI app knows.
As the article shows, there is some evidence that OpenAI may be using a new embeddings model under the hood of assistants retrieval. If they are, and if it's substantially better than the competition, then open-source RAG may lag for a while.
--
But if they're just using ada v2 (or if the embeddings improvement is in cost, rather than performance), there should be tremendous potential for open-source models in this space.
First of all, ada v2 is an aging model that has solid open-source competition.
But more importantly, it seems the key is an LLM agent loop that can best make use of the RAG primitives. Intuitively, I'd expect open-source models to be smart enough for very good results in this domain.