Example of t-distributed stochastic neighbor embedding 

We can apply this powerful algorithm to the same Olivetti faces dataset, using the Scikit-Learn class TSNE with n_components=2 and perplexity=20:

from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, perplexity=20)
X_tsne = tsne.fit_transform(faces['data'])

The result for all 400 samples is shown in the following graph:

t-SNE applied to the Olivetti faces dataset

A visual inspection of the label distribution can confirm that t-SNE recreated the optimal clustering starting from the original high-dimensional distribution. This algorithm can be employed in several non-linear dimensionality reduction tasks, such as images, word embeddings, or complex feature vectors. Its main strength is hidden in the assumption to consider the similarities as probabilities, without the need to impose any constraint on the pairwise distances, either global or local. Under a certain viewpoint, it's possible to consider t-SNE as a reverse multiclass classification problem based on a cross-entropy cost function. Our goal is to find the labels (low-dimensional representation) given the original distribution and an assumption about the output distribution.

At this point, we could try to answer a natural question: which algorithm must be employed? The obvious answer is it depends on the single problem. When it's useful to reduce the dimensionality, preserving the global similarity among vectors (this is the case when the samples are long feature vectors without local properties, such as word embeddings or data encodings), t-SNE or Isomap are good choices. When instead it's necessary to keep the local distances (for example, the structure of a visual patch that can be shared by different samples also belonging to different classes) as close as possible to the original representation, locally linear embedding or spectral embedding algorithms are preferable.