This is all good, but how does one store, index and efficiently search for near matches?
I have a side project with terabytes of photo URLs, and hashing&indexing them has been an itch I could not scratch.
Looks like it does all kinds of tricks at once -- histograms, SIFT/SURF features and edge patterns.
I was looking for a simpler solution, like an effective MVP-tree implementation for perceptual hashes. Wouldn't that be faster and overall better?
Check out metric trees such as the BK-Tree, VP-Tree or GNATs. They allow for fast nearest neighbor searches, for example querying all items in your database with Hamming Distance h < 2.
I stumbled upon metric trees for nearest neighbor queries in metric spaces some months ago and wrote down notes here:
We have developed neural net based solution for finding images which are nearly or even distantly similar. It can be trained to work on any type of image and works pretty accurately.
I suspect you could just use the content loss approach from the style transfer examples.
Take say the second last layer of a pre-trained convolutional neural network (e.g. VGG) and just use that to compute cosine similarities between images.
Would have to pick a cutoff with some manual testing but I suspect that'd work plus can be done in a matter of hours using pre-trained weights in keras.
The obvious (yet politically controversial) solution is to just use one of the many Open face recognition embedding CNNs and then combine with low level appearance descriptor.
It is obvious that a large percentage of photos are not of people. It is obvious that most photos uploaded to a dating website will have faces in them.
The solution to your solution could then be to generate new faces programmatically in an automated fashion.
One tactic could simply be combining faces(in some offline version of something like: http://www.morphthing.com/, no affiliation).
Another method could be targeted and non-detrimental manipulation of facial constructs. I could imagine using liquefaction filters designed to operate within a perceptually acceptable range for a human while working towards the the goal of gaining enough change to be new in the eye of the facial recognition algorithms.
Still a worthy effort by OkCupid, they'll scrape off the bottom this way.