7 min read

Calculate PCA by hand (via eigen-decomposition)

In this blog post, I will calculate PCA step-by-step (via eigen-decomposition).

But before we dive deep into PCA, there are two prerequisite concepts we need to understand:

  • Variance/Covariance
  • Find eigenvectors and eigenvalues

If you already familiar those two concepts, feel free to skip those sections.

 
 
 
 

Prerequisite 1: Variance/Covariance

Variance

Variance measures how far a set of numbers is spread out from their average value. The sample variance is defined as:

s2=(xiˉx)2n1

Let’s say you have a vector v=[1,2,3]. To calculate the variance, there are two steps:

  • Calculate mean: ˉv=(1+2+3)3=2
  • Calculate variance: s2=(12)2+(22)2+(32)231=1

 
 

Covariance

The covariance of two vector x,y is defined as Cov(x,y)=(xiˉx)(yiˉy)n1 If the Cov(x,y)>0, that means x,y have the same trend (they co-increase or co-decrease).
If the Cov(x,y)<0, that means x,y have the opposite trend (one increases, the other decreases).

Let’s give an example here. x=[1,2,3],y=[3,2,7]

To calculate the covariance, we have two steps as well:

  • Calculate the mean of x,y. We have ˉx=2, ˉy=4
  • Calculate the covariance Cov(x,y)=(12)(34)+(22)(24)+(32)(74)31=2.

Since the Cov(x,y)=2>0, we know that x,y co-vary.

If we plot those three points in a x-y coordinate plane, we get:

Covariance

It is obvious that x,y have the same trend (they go up/down together).

 
 
 
 

Prerequisite 2: Find eigenvectors and eigenvalues

The word “eigen” sounds scary, at least to me when I first learned linear algebra. But it is not that complicated. I am going to walk you through the calculation of eigenvalue and eigenvectors.

Assume you have a matrix A, all eigen-decomposition does is that it finds a set of vectors v and λ, so that Av=λv

This means applying a linear transformation A to the vector v, is the same as scaling v by a factor λ. In other word, the linear transformation(a.k.a matrix) A doesn’t rotate v.

 

Here I provide an example with a 2×2 matrix. Let’s say our matrix A is:

[0123]

By definition, we want to find v,λ so that [0123]v=λv

We find λ first. Let the right side of the equation times the identity matrix I,

[0123]v=λ[1001]v[0123]v=[λ00λ]v[0λ123λ]v=0det([0λ123λ])=0λ(3λ)+2=0

Solve this quadratic equation, we get: λ1=2, λ2=1.

 

So far we have found our eigen-values ( λ ). The next step is to find our eigenvectors. We should have v1,v2, each of them is paired with an eigenvalue. Also, notice that there are infinite amount of eigenvectors and they all point to the same direction. We want to find the one with length equal to 1.

 
To calculate eigenvectors, we simply plug λ1,λ2 into the following equation, which is obtained when we are trying to figure out eigenvalues:

[0λ123λ]v=0

We plug in λ1=2, and get

[2121]v1=0

This is essentially finding the null space of the matrix on the left side. I will skip this part for now, but you can find plenty of resources online. But here is the result: v1=[0.447,0.894].

 

We can apply the same technique to find the second pair of eigenvalue & eigenvector:

λ2=1,v2=[0.707,0.707]

 
 
 
 
 
 

PCA: eigen-decompose the covariance matrix

If you are still with me, you are 80% done. The rest of this post is simply combine the two techniques I discussed above.

Let’s say we have a matrix `[100111]

Generally, each row represents our observations/samples, and each column represents features we measured. After applying PCA, we are expect to reduce the dimensionality of the matrix, while still retain most of the information.

Keep in mind, the high level intuition of PCA is that you rotate the coordinate system, so that your first coordinate (PC1) captures most of the variation, and second coordinate (PC2) captures the second most of the variation, etc…

Okay, let’s first take a look of our data

PCA1

Imagine you fix the origin, and counterclockwise spin the coordinate system, until the x-axis capture the most of the variation. Here is a visual demonstration

PCA2

Now we can visually tell we should rotate 45 degree, so that our PC1 can capture most of the variation. But how can I calculate this mathematically?

 
 
 
 

Step 1

We have two features/columns C1=[1,0,1], C2=[0,1,1]. Now we need to construct a matrix, where the diagonal entries are the variance, and off-diagonal entries are covariance. In our case, we should construct a matrix like this:

`[Var(C1)Cov(C1,C2)Cov(C2,C1)Var(C2)]

Use the technique (section 1) I described above, we have Var(C1)=1 Var(C2)=1 Cov(C1,C2)=Cov(C2,C1)=0.5

Plug in the result, we have our covariance matrix: `[10.50.51]

 

Step 2

Find the eigenvalues and eigenvectors of our covariance matrix. We have

λ1=1.5,v1=[0.707,0.707]

λ2=0.5,v2=[0.707,0.707]

Notice v1 is pointing to 45 degree upward, v2 is pointing to 135 degree upward, which is exactly what we expected.

Therefore, we find our PC1 and PC2, which are v1 and v2!

 

Step3

Here we can combine the eigenvectors into a matrix, with the first column to be v1, and second column to be v2. This is often called loading matrix (although the word “loading” is somewhat ambiguous) [0.7070.7070.7070.707]

If we multiply the loading matrix and our dataset, we should get our scores matrix, which is the projection of the data on the PCs.

[100111][0.7070.7070.7070.707]=[0.7070.7070.7070.7071.4140]

 

Now we have our transformed data: [0.7070.7070.7070.7071.4140]

Let’s plot our transformed data, with PCs be the coordinate system:

PCA3

The last point here:

In the scores matrix, if you calculate the variance of the first column, you will get Var(PC1)=1.5. For the second column, you will get Var(PC2)=0.5.

Does the number look familiar? In fact, Var(PC1)=1.5=λ1 Var(PC2)=0.5=λ1

For this dataset, λ1 explains 75% of the variance since 1.51.5+0.5=0.75; Likewise, λ2 explains 25% of the variance since 0.51.5+0.5=0.25

In summary, eigenvalue tells you how much variance captured by its associated PC

 
 
 

That’s the end of this post. Thanks for reading!