Y not Elasticsearch? I don't see it addressed in the article, but Elastic 8 has ...

sidi · on May 5, 2023

There currently isn't a way to filter docs alongside a KNN query, and the dimension support is limited to 1024 (a Lucene limitation) and OpenAI embeddings are 1536 dimensions - also indexing performance is not comparable. Wishing this changes, as they're a good stack for the reasons you state

softwaredoug · on May 5, 2023

True though I do think 2k dims is coming it 8.8

peterstjohn · on May 5, 2023

Are they forking Lucene or somehow getting the Lucene devs to increase that limit? Because this PR has been open for over a year now: https://github.com/apache/lucene/issues/11507

softwaredoug · on May 5, 2023

No - they just did something in Elasticsearch to make their own FieldType https://github.com/elastic/elasticsearch/pull/95257

heipei · on May 5, 2023

Plus Elasticsearch is a breeze to operate and scale in a fault-tolerant matter.

bbarnett · on May 5, 2023

I suspect missing sarcasm tags here. The very least, from a lack of a stable, security update only release beanch.

heipei · on May 5, 2023

Not really, I've been operating 10+ node Elasticsearch cluster for years, running on a workload scheduler (Nomad). I never have to perform any maintenance or housekeeping except deleting old indices, and updates are performed by bumping a container version number and then restarting nodes one-by-one with a delay in between.

trgn · on May 5, 2023

Vector will eventually just be another data-type in all db-systems. Already so many production systems have their data replicated across multiple dbs, just to accommodate different use-cases. I'm not keen in adding yet another one.

hobs · on May 5, 2023

In the ANN benchmarks Elastic sets the bottom bar afaict.http://ann-benchmarks.com/

softwaredoug · on May 5, 2023

That appears to be the old community maintained plugin, Elastic KNN, not the official Lucene based HNSW implementation.

hobs · on May 5, 2023

That's very interesting to me, do you know if there's any numbers on the official implementation?

softwaredoug · on May 5, 2023

Not that I know of, I would love to see them if they exist...

gk1 · on May 5, 2023

Hey Doug :)

We always encourage folks to do their own testing. Everyone has different performance requirements, data shapes/sizes, budgets, and expectations of the user experience.

Elasticsearch is a great option. But clearly there's a large cohort of smart teams that decided the combination of performance + cost + scale + [etc] on Pinecone makes more sense for them.

softwaredoug · on May 5, 2023

Hey Greg! Yes I am trolling a bit, to see what the answers might be.

IMO - the real reason "Y Not Elasticsearch" is not because they're dumb or its bad. It's actually because they're not building for the search / AI market like you all are :)

When someone runs out of RAM with their Numpy array, they google, and you guys come up really speaking to that audience, building features, showing people how to build specific solutions, etc.