I looked at the concepts in FAISS and it seems fairly straightforward. In non-jargon you have dimensionality reduction and neighborhoods.
DR is taking a long embedding and doing something to make it shorter. An easy to follow method for this is minhash.
Neighborhoods is representing a cluster of embeddings with a single representative to speed up comparisons. For example, find me the two closest representatives then doing a deeper comparison on all the residents.
Now the feature I haven't seem that will probably cause me to build instead of buy. Most seem designed for a single organization and a single use. For example, Spotify song recommender.
I would like to store embedding from multiple models and be able to search per model. I would also like fine grain user access control, so users could search their embeddings and grant access to others.
If the different models use the same dimensionality, you can keep their embeddings within different namespaces inside the same index. See: https://docs.pinecone.io/docs/namespaces
If you mean for your end-users, you can use namespaces again to separate embeddings for different users inside one index. See: https://docs.pinecone.io/docs/multitenancy
There isn't yet a combination of the two, where you provide Pinecone API access to end-users.
Thank you, I'll definitely play with pinecone before I build. The dimensionality might vary between models or versions of models. Additionally, the end goal would be to expose it to users and not have to post filter. So probably an index per user. Not sure how expensive that is to recalculate regularly.
DR is taking a long embedding and doing something to make it shorter. An easy to follow method for this is minhash.
Neighborhoods is representing a cluster of embeddings with a single representative to speed up comparisons. For example, find me the two closest representatives then doing a deeper comparison on all the residents.
Now the feature I haven't seem that will probably cause me to build instead of buy. Most seem designed for a single organization and a single use. For example, Spotify song recommender.
I would like to store embedding from multiple models and be able to search per model. I would also like fine grain user access control, so users could search their embeddings and grant access to others.