I enjoy the explosion of tools. Only time will tell which ones stand the test of time. But this is my day job so I never get tired of new tools but I can see how non-industry folks can find it overwhelming
Can you expand on that?
Where do big enterprise orgs products fit in, eg Microsoft, Google?
What are the leading providers as you see them?
As an outsider it is bewildering. First I hear that llama_index is good, then I hear that its overcomplicating slop. What sources or resources are reliable on this? How can we develop anything that will still stand in 12 months time?
May help to think of these tools as on the opposite end of the spectrum. As an analogy:
1. langchain, llamaindex, etc are the equivalent of jquery or ORMs for calling third-party LLMs. They're thin adapter layers with a bit of consistency and common tasks across. Arguably like React, where they are thin composition layers. So complaints of being leaky abstractions is in the sense of an ORM getting in the way vs helping.
2. KG/graph RAG libraries are the LLM equivalent of, when regex + LIKE sql statements aren't enough, graduating to a full-blown lucene/solr engine. These are intelligence engines that address index-time, query-time, and likely, both. Thin libraries and those lacking standard benchmarks are a sign of experiments vs production-relevant: unless you're just talking to 1 pdf, not likely what you want. IMO, no 'winners' here yet: llamaindex was part of an early wave of preprocessors that feed PDFs etc to the KG, but not winning the actual 'smart' KG/RAG. In contrast, MSR Graph RAG is popular and benchmarks well, but if you read the github & paper, not intended for use -- ex: it addresses 1 family of infrequent query you'd do in a RAG system ("n-hop"), but not the primary kinds like mixing semantic+keyword search with query rewriting, and struggles with basics like updates.
Most VC infra/DB $ goes to a layer below the KG. For example, vector databases -- but vector DBs are relatively dumb blackboxes, you can think of them more like S3 or a DB index, while the LLM KG/AI quality work is generally a layer above. (We do train & tune our embedding models, but that's a tiny % of the ultimate win, mostly for smarter compression for handling scaling costs, not the bigger smarts.)
+ 1 to presentation being confusing! VC $ on agents, vector DB co's, etc, and well-meaning LLM enthusiasts are cranking out articles on small uses of LLMs, but in reality, these end up being pretty crappy in quality if you'd actually ship them. So once quality matters, you get into things like the KG/graph RAG work & evals, which is a lot more effort & grinding => smaller % of the infotainment & marketing going around.
(We do this stuff at real-time & data-intensive scales as part of Louie.AI, and are always looking for design partners, esp on graph rag, so happy to chat.)
imo, none. Unfortunately, the landscape is changing too fast. May be things will stabilize, but for now I find experimentation a time-consuming but essential part of maintaining any ML stack.
But it's okay not to experiment with every new tool (it can be overwhelming to do this). The key is in understanding one's own stack and filtering out anything that doesn't fit into it.
> How can we develop anything that will still stand in 12 months time?
The pace at which things are moving, likely none. You will have to keep making changes as and when you see newer things. One thing in your favor (arguably) is that every technique is very dependent on the dataset and problem you are solving. So, if you do not have the latest one implemented, you would be okay, as long as your evals and metrics are good. So, if this helps, skip the details, understand the basics, and go for your own implementation. One thing to look out for is new SOTA LLM releases, and the jumps in capability. Eg: 4o did not announce it, but they started doing very well on vision. (GPT-4 was okay, 4o is empirically quite better). These things help when you update your pipeline.
Well the rate of new LLMs keep coming out, but since they’re all trying to model language, they should all be fairly interchangeable and potentially will converge.
It’s not hard for a product to swap the underlying LLM for a given task.
I meant not a jump in text generation ability, but more like adding a completely new modality and the likes. With 4o, you can have a multimodal embedding space and provide more relevant context to a model for fewer tokens (and higher accuracy). Ideally everyone would get there, but upgrading your pipeline is more about getting the latest functionality faster rather than just a slightly better generation.
The issue is that this technology has no most (other than the cost to create models and datasets)
There’s not a lot of secret sauce you can use that someone else can’t trivially replicate, given the resources.
It’s going to come down to good ol product design and engineering.
The issue is openai doesn’t seem to care about what their users want. (I don’t think their users know what they want either, but that’s another discussion)
They want more money to make bigger models in the hope that nobody else can or will.
They want to achieve regulatory capture as their moat.
For all their technical abilities at scaling LLM training and inference, I don’t get the feeling that they have great product direction.
haha I had heard that langchain was overcomplicated, self contradictory slop and that llama index was better. I dont doubt its bad as well.
Both are cut from the same cloth of typical inexperienced devs who made something cool in a new space and posted on GitHub but then immediately morphed into a companies trying to trap users etc. without going through an organic lifecycle of growing, improving, refactoring with the community.
But unfortunately its like a game of musical chairs or whoever is pushing their wares the hardest that we may get stuck with rather than the actual best solution.
In fact, im wondering if thats what happened in the early noughts and we had the misfortune of Java, and still have the misfortune of Javascript.