Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Y not Elasticsearch?

I don't see it addressed in the article, but Elastic 8 has ANN support, and every other feature you'd expect out of a ranking system. Vectors are only one piece of the puzzle for building such a system. (honest question, not trying to troll, as I truly do <3 these pinecone articles)

(Similarly, Y not Solr, Vespa, etc etc) :)




There currently isn't a way to filter docs alongside a KNN query, and the dimension support is limited to 1024 (a Lucene limitation) and OpenAI embeddings are 1536 dimensions - also indexing performance is not comparable. Wishing this changes, as they're a good stack for the reasons you state


True though I do think 2k dims is coming it 8.8


Are they forking Lucene or somehow getting the Lucene devs to increase that limit? Because this PR has been open for over a year now: https://github.com/apache/lucene/issues/11507


No - they just did something in Elasticsearch to make their own FieldType https://github.com/elastic/elasticsearch/pull/95257


Plus Elasticsearch is a breeze to operate and scale in a fault-tolerant matter.


I suspect missing sarcasm tags here. The very least, from a lack of a stable, security update only release beanch.


Not really, I've been operating 10+ node Elasticsearch cluster for years, running on a workload scheduler (Nomad). I never have to perform any maintenance or housekeeping except deleting old indices, and updates are performed by bumping a container version number and then restarting nodes one-by-one with a delay in between.


Vector will eventually just be another data-type in all db-systems. Already so many production systems have their data replicated across multiple dbs, just to accommodate different use-cases. I'm not keen in adding yet another one.


In the ANN benchmarks Elastic sets the bottom bar afaict.http://ann-benchmarks.com/


That appears to be the old community maintained plugin, Elastic KNN, not the official Lucene based HNSW implementation.


That's very interesting to me, do you know if there's any numbers on the official implementation?


Not that I know of, I would love to see them if they exist...


Hey Doug :)

We always encourage folks to do their own testing. Everyone has different performance requirements, data shapes/sizes, budgets, and expectations of the user experience.

Elasticsearch is a great option. But clearly there's a large cohort of smart teams that decided the combination of performance + cost + scale + [etc] on Pinecone makes more sense for them.


Hey Greg! Yes I am trolling a bit, to see what the answers might be.

IMO - the real reason "Y Not Elasticsearch" is not because they're dumb or its bad. It's actually because they're not building for the search / AI market like you all are :)

When someone runs out of RAM with their Numpy array, they google, and you guys come up really speaking to that audience, building features, showing people how to build specific solutions, etc.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: