There are plenty of options available to run your own local vector database, txtai is one of them. Ultimately depends if you have a sizable development team or not. But saying it is impossible is a step too far.
Even in that article with much smaller vectors than what GPT puts out (1536 dimensions) QPS drops below 100 if recall@1 is more than 0.4. That's to say nothing of cost of regenerating this index using incremental updates. I don't get why people on HN are so adamant on the idea that no one needs scale beyond 1 machine ever.
If you have a billion vectors, is “yourself” a large tech company who does stuff like roll their own browsers, programming languages, invents kubernetes etc. Probably could roll this! And indeed sell this.
Last time I had to deal with vector representation of documents was more than 10 years ago, so I'm a bit rusty, but billion vector scale sound relatively trivial.
With retrieval time in the milliseconds? The entries may be ads, or something else user facing. Your users are not going to sit around while you leisurely retrieve them.
You do realize you have to query an index of all of that data for every single query your use makes right? Computing that index is not entirely trivial, nor is the operation of partitioning the data so it fits in ram across a pool of nodes.
Sure, role your own, but don’t act like making a highly scalable database is a weekend project.
Consumer hardware can still handle that with 1TB RAM + ThreadRipper Pro.
> You do realize you have to query an index of all of that data for every single query your use makes right? Computing that index is not entirely trivial, nor is the operation of partitioning the data so it fits in ram across a pool of nodes.
I don't know what any of this means -- and it sounds like you're slapping a bunch of terminology together, rather than communicating a well-thought-out idea.
Yes, in the general case you're going to have to use an index. Computing an index or a key to that index? Computing the index is a solved problem, that does not have a hard real-time component -- you can do it outside of normal query executions. Computing the key to the index on each query is also a solved problem.
Have dimensions stored in columnar format, generate a sparse primary index on said columns, and then use binary search to quickly find the blocks of interest to do a sequential search on viz. distance function. Or you could even just use regular old SS trees, SR trees, or M Trees for high-dimensional indexing -- they're not expensive to use at all.
There, you can easily run a query on a single dimension (1 billion entries) under a second. You want 300 dimensions? Ok, parallelize it. 128 threads, easy. At most this will take 3 seconds if everything is configured properly (big IF, that seems like few can get right).
This is literally a weekend project. Anyone can build something like this, but not everyone has the integrity to be upfront about how they're reinventing the wheel, and spinning it like they've just broken ground in database R&D.
What kind of QPS are you looking at? How are you handling 1536 dimensions? How long does an incremental index update take? These are the problems you run into in building such a system.
I'm not familiar with the index part, but you can get at least 2TB on a single CPU socket these days. You shouldn't need multiple machines to fit in RAM. Depending on what QPS you need to handle, you might also be fine to not have the whole thing fit in RAM.
lol, not true. Even for huge vectors (1000 page docs), today you can do this with enough disk storage with something like leveldb on a single node, and in memory with something like ScaNN for nearest neighbor.