Hacker News new | past | comments | ask | show | jobs | submit login

We are currently using perceptual hashes (e.g. phash.org) to do hundreds of thousands of image comparisons per day.

As mentioned in another comment, you really have to test different hashing algorithms to find one that suits your needs best. In general though, I think it is in most cases not necessary to develop an algorithm from scratch :)

For us, the much more challenging part was/is to develop a system that can find similar images quickly. If you are interested in things like that have a look at VP/MVP data structures (e.g. http://pnylab.com/pny/papers/vptree/vptree/, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.7...).




Indeed, the storage and retrieval of similar images is the hardest part. I do not know of a single networked open-source storage solution for this. I really wish that there was a project with a mindset of Redis, but for MVP trees. By the way, may it be possible to implement MVP data structure in Redis, as the project is now? I can not think of possible replication issues with this, apart from the fact that one would have to pre-define a metric space for every tree.

It could be a great extension to Redis DSL.


Yes, you're right. We're not using SQL queries at the moment as that would be very inefficient, it was just as an example for a small dataset.

I'm currently researching MVP's and reading on VP-trees, BK-trees [1], GNAT [2] and HEngine [3]. Do you have any advice?

[1] http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK...

[2] http://www.vldb.org/conf/1995/P574.PDF

[3] https://www.cse.msu.edu/~alexliu/publications/HammingQuery/H...


I think you are on the right track there.

The thing is though, you won't have difficulties finding papers on those topics. However, you will probably not have any luck finding many concrete and practical implementations that you could look at.

So it's a far way from reading the papers to having something working.

If you find something, please let me know.


There's a list of CBIR's on Wikipedia and ammong those there are a few open source ones. I didn't really had time to check them all but during skimming through them imgSeek [2] caught my eye.

[1] http://en.wikipedia.org/wiki/List_of_CBIR_engines

[2] http://www.imgseek.net/isk-daemon




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: