Example of Sanger's network

For this Python example, we consider a bidimensional zero-centered dataset X with 500 samples (we are using the function defined in the first chapter). After the initialization of X, we also compute the eigendecomposition, to be able to double-check the result:

import numpy as np

from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=500, centers=2, cluster_std=5.0, random_state=1000)
Xs = zero_center(X)

Q = np.cov(Xs.T)
eigu, eigv = np.linalg.eig(Q)

print(eigu)
[ 24.5106037 48.99234467]

print(eigv)
[[-0.75750566 -0.6528286 ] [ 0.6528286 -0.75750566]]

n_components = 2

W_sanger = np.random.normal(scale=0.5, size=(n_components, Xs.shape[1]))
W_sanger /= np.linalg.norm(W_sanger, axis=1).reshape((n_components, 1))

The eigenvalues are in reverse order; therefore, we expect to have a final W with the rows swapped. The initial condition (with the weights multiplied by 15) is shown in the following diagram:

Dataset with W initial condition, we can implement the algorithm. For simplicity, we preferred a fixed number of iterations (5000) with a learning_rate of η=0.01. The reader can modify the snippet to stop when the weight matrix becomes stable:

learning_rate = 0.01
nb_iterations = 5000
t = 0.0

for i in range(nb_iterations):
dw = np.zeros((n_components, Xs.shape[1]))
t += 1.0

for j in range(Xs.shape[0]):
Ysj = np.dot(W_sanger, Xs[j]).reshape((n_components, 1))
QYd = np.tril(np.dot(Ysj, Ysj.T))
dw += np.dot(Ysj, Xs[j].reshape((1, X.shape[1]))) - np.dot(QYd, W_sanger)

W_sanger += (learning_rate / t) * dw
W_sanger /= np.linalg.norm(W_sanger, axis=1).reshape((n_components, 1))

The first thing to check is the final state of W (we transposed the matrix to be able to compare the columns):

print(W_sanger.T)
[[-0.6528286 -0.75750566] [-0.75750566 0.6528286 ]]

As expected, W has converged to the eigenvectors of the input correlation matrix (the sign which is associated with the sense of w—is not important because we care only about the orientation). The second eigenvalue is the highest, so the columns are swapped. Replotting the diagram, we get the following:

Final condition, w has converged to the two principal components

The two components are perfectly orthogonal (the final orientations can change according to the initial conditions or the random state) and w0 points in the direction of the first principal component, while w1 points in the direction of the second component. Considering this nice property, it's not necessary to check the magnitude of the eigenvalues; therefore, this algorithm can operate without eigendecomposing the input covariance matrix. Even if a formal proof is needed to explain this behavior, it's possible to understand it intuitively. Every single neuron converges to the first principal component given a full eigenvector subspace. This property is always maintained, but after the orthogonalization, the subspace is implicitly reduced by a dimension. The second neuron will always converge to the first component, which now corresponds to the global second component, and so on.

One of the advantages of this algorithm (and also of the next one) is that a standard PCA is normally a bulk process (even if there are batch algorithms), while a Sanger's network is an online algorithm that is trained incrementally. In general, the time performance of a Sanger's network is worse than the direct approach because of the iterations (some optimizations can be achieved using more vectorization or GPU support). On the other side, a Sanger's network is memory-saving when the number of components is less than the input dimensionality (for example, the covariance matrix for n=1000 has 106 elements, if m = 100, the weight matrix has 104 elements).