Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
RAG vs. Context-Window in GPT-4: accuracy, cost, & latency (ai88.substack.com)
18 points by swiftlyTyped on Dec 5, 2023 | hide | past | favorite | 8 comments


Using the full context window each time is a great way to ensure your app is slow, expensive, and in-accurate. RAG is crucial, anyone thats built an AI app knows.


Great article, but what about open-source LLMs? Results will be the same?


It'd be interesting to test and find out.

As the article shows, there is some evidence that OpenAI may be using a new embeddings model under the hood of assistants retrieval. If they are, and if it's substantially better than the competition, then open-source RAG may lag for a while.

--

But if they're just using ada v2 (or if the embeddings improvement is in cost, rather than performance), there should be tremendous potential for open-source models in this space.

First of all, ada v2 is an aging model that has solid open-source competition.

But more importantly, it seems the key is an LLM agent loop that can best make use of the RAG primitives. Intuitively, I'd expect open-source models to be smart enough for very good results in this domain.


This is really interesting!


Great insights Atai


Great Article !!


Great analysis


TL;DR: (RAG + GPT-4) delivers superior performance, at 4% of the cost.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: