Reposting my comment from here: https://news.ycombinator.com/item?id=17816981 --...

Reposting my comment from here:

https://news.ycombinator.com/item?id=17816981

----

Prime factors of a number is the ultimate high-dimensional space.

Damn. The more I see UMAP, the more I think it is going to be a central and generic tool for high-dimensional analysis. I haven't taken the time to go in depth into it yet, though :/

So far, my understanding of it is: t-SNE on steroids

* t-SNE is great for local proximity, but it 'rips' high dimensional global structures too early. UMAP solves both scales by using transformations to map overlapping points of the different lower dimensional spaces that are locally relevant.

* It is faster than t-SNE, and has a better scale factor.

* t-SNE is about moving the points when UMAP is about finding the transformations that move the points.. which means:

a) it yields a model that you can use to create embeddings for unseen data. This means sharing your work by contributing to the public model zoos.

b) And you can also do supervised dimension reduction as you create your embedding. Ie You can judge if the shape looks good for unseen data (aka it generalizes well), and then correct the embedding by choosing which unseen instances to add to the training set. This means you control the cost of labeling data. You can see where your errors are, and back-propagate them to the collection process in a cost effective manner. For high dimensional data.

* You can choose your metric! Specific a distance function and you're good to go. Haversine for a great orange peeling, Levenshtein for visualizing word spelling (and maybe provide an embedding for ML-based spell checking?)

* You can choose the output space to be greater than 2 or 3, in order to stop the compression at a specified level.

I believe it will replace t-SNE in the long term.

Here is a great video of the author presenting his work:

https://www.youtube.com/watch?v=nq6iPZVUxZU