Benchmarking for this project is a bit weird, since 1) only linear scans are sup...

dmezzetti · on May 3, 2024

Author of txtai here - great work with this extension.

I wouldn't consider it a "this or that" decision. While txtai does combine Faiss and SQLite, it could also utilize this extension. The same task was just done for Postgres + pgvector. txtai is not tied to any particular backend components.

alexgarcia-xyz · on May 3, 2024

Ya I worded this part awkwardly - I was hinting that querying a vector index and joining with metadata with sqlite + sqlite-vec (in a single SQL join) will probably be faster than other methods, like txtai, which do the joining phase in a higher level like Python. Which isn't a fair comparison, especially since txtai can switch to much faster vector stores, but I think is fair for most embedded use-cases.

That being said, txtai offers way more than sqlite-vec, like builtin embedding models and other nice LLM features, so it's definitely apples to oranges.

dmezzetti · on May 3, 2024

I'll keep an eye on this extension.

With this, what DuckDB just added and pgvector, we're seeing a blurring of the lines. Back in 2021, there wasn't a RDBMS that had native vector support. But native vector integration makes it possible for txtai to just run SQL-driven vector queries...exciting times.

I think systems that bet on existing databases eventually catching up (as is txtai's model) vs trying to reinvent the entire database stack will win out.