Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You might be interested in https://datasette.io/plugins/datasette-faiss, which I'm using alongside openai-to-sqlite for similarity search of embeddings, following @simonw's excellent instructions at https://simonwillison.net/2023/Jan/13/semantic-search-answer...



Thanks, but the index being in-memory makes it unsuitable for large data sets :/


There is a way of running disk-backed FAISS indexed that don't all fit in memory but I've not quite figured out how to do that yet: https://github.com/facebookresearch/faiss/issues/2675


OpenSearch K-NN plugin supports FAISS and it's disk based:

https://opensearch.org/docs/latest/search-plugins/knn/index/


OpenSearch looks like the best so far, all my requirements combined!


Can you say more? Usually projects that gravitate to SQLlite are not those that require massive scale and a FAISS index of a few GB covers a lot of documents.


My dataset is going to be around 10M documents. With OpenAI embeddings, that will be around 62GB. AFAIK SQLite should be able to handle that size, but I haven't tried.

This is not going to be my primary DB. I would update this maybe once a day and the update doesn't have to be super fast.


you might check out some vector databases:

https://milvus.io/

AND

pinecone.io

there are others too




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: