Thanks for the input! I asked about the scale of items and traffic, because my use case actually requires separate piece of infrastructure. It's around 100 millions of items and live production traffic from millions of users with high latency demand. So it's not a batch job that can be performed in memory, as I understand your case.
Currently I use Elasticsearch with the Open Distro approximate kNN plugin by the way.
Currently I use Elasticsearch with the Open Distro approximate kNN plugin by the way.