RAG vs. Context-Window in GPT-4: accuracy, cost, & latency

calebrjohn · on Dec 5, 2023

Using the full context window each time is a great way to ensure your app is slow, expensive, and in-accurate. RAG is crucial, anyone thats built an AI app knows.

igorkotua · on Dec 5, 2023

Great article, but what about open-source LLMs? Results will be the same?

swiftlyTyped · on Dec 5, 2023

It'd be interesting to test and find out.

As the article shows, there is some evidence that OpenAI may be using a new embeddings model under the hood of assistants retrieval. If they are, and if it's substantially better than the competition, then open-source RAG may lag for a while.

--

But if they're just using ada v2 (or if the embeddings improvement is in cost, rather than performance), there should be tremendous potential for open-source models in this space.

First of all, ada v2 is an aging model that has solid open-source competition.

But more importantly, it seems the key is an LLM agent loop that can best make use of the RAG primitives. Intuitively, I'd expect open-source models to be smart enough for very good results in this domain.

nathan_tarbert · on Dec 5, 2023

This is really interesting!

parminder_88 · on Dec 5, 2023

Great insights Atai

schitupolu · on Dec 5, 2023

Great Article !!

theJoShPENNER · on Dec 5, 2023

Great analysis

swiftlyTyped · on Dec 5, 2023

TL;DR: (RAG + GPT-4) delivers superior performance, at 4% of the cost.