Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Evaluating Perceptual Image Hashes at OkCupid (okcupid.com)
97 points by based2 on June 3, 2017 | hide | past | favorite | 14 comments


This is all good, but how does one store, index and efficiently search for near matches? I have a side project with terabytes of photo URLs, and hashing&indexing them has been an itch I could not scratch.


Have you looked at https://github.com/dermotte/lire? This plugin for ElasticSearch uses it: https://github.com/kzwang/elasticsearch-image


Looks like it does all kinds of tricks at once -- histograms, SIFT/SURF features and edge patterns. I was looking for a simpler solution, like an effective MVP-tree implementation for perceptual hashes. Wouldn't that be faster and overall better?


Check out metric trees such as the BK-Tree, VP-Tree or GNATs. They allow for fast nearest neighbor searches, for example querying all items in your database with Hamming Distance h < 2.

I stumbled upon metric trees for nearest neighbor queries in metric spaces some months ago and wrote down notes here:

http://daniel-j-h.github.io/post/nearest-neighbors-in-metric...

Hope that helps.


We have developed neural net based solution for finding images which are nearly or even distantly similar. It can be trained to work on any type of image and works pretty accurately.

You can check demo at: https://www.turingiq.com/demo/image/similar


I suspect you could just use the content loss approach from the style transfer examples.

Take say the second last layer of a pre-trained convolutional neural network (e.g. VGG) and just use that to compute cosine similarities between images.

Would have to pick a cutoff with some manual testing but I suspect that'd work plus can be done in a matter of hours using pre-trained weights in keras.



The obvious (yet politically controversial) solution is to just use one of the many Open face recognition embedding CNNs and then combine with low level appearance descriptor.


The author pointed out on Reddit that a large proportion of the photos don't have faces in them.


It is obvious that a large percentage of photos are not of people. It is obvious that most photos uploaded to a dating website will have faces in them.


Did you even read the Reddit comment? [1]

    Largely the issue with this is that a lot of spammers
    don't have photos with faces [...]
[1] https://www.reddit.com/r/programming/comments/6efoqw/evaluat...


Wouldn't you then just flag as spam any profiles with a high proportion of photos with no faces in them?


The solution to your solution could then be to generate new faces programmatically in an automated fashion.

One tactic could simply be combining faces(in some offline version of something like: http://www.morphthing.com/, no affiliation).

Another method could be targeted and non-detrimental manipulation of facial constructs. I could imagine using liquefaction filters designed to operate within a perceptually acceptable range for a human while working towards the the goal of gaining enough change to be new in the eye of the facial recognition algorithms.

Still a worthy effort by OkCupid, they'll scrape off the bottom this way.


I wonder if the author has looked into using SURF/OTB etc?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: