PCA is a useful statistical technique that has found application in fields such as face recognition and image compression, and is a common technique for finding patterns in data of high dimension. PCA is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences. Since patterns in data can be hard to find in data of high dimension, where the luxury of graphical representation is not available, PCA is a powerful tool for analysing data.
The other main advantage of PCA is that once you have found these patterns in the data, and you compress the data, ie. by reducing the number of dimensions, without much loss of information. This technique used in image compression, as we will see in a later section.
- Get some data
- Substract the mean
- Calculate the covariance matrix
- Calculate the eigenvectors and eigenvalues of the covariance matrix
- Choosing components and forming a feature vector
- Deriving the new data set
Matlab Code
clear all, clc, close all %% Step 1: Get some data X = [2.5 2.4; 0.5 0.7; 2.2 2.9; 1.9 2.2; 3.1 3.0; 2.3 2.7; 2 1.6; 1 1.1; 1.5 1.6; 1.1 0.9]; axis equal axis([-1 4 -1 4]) hold on plot(X(:,1), X(:,2),'k+'); pause plot([0 0],[-1 4],'k:'); plot([-1 4], [0 0],'k:'); title('Original PCA Data'); %% Step 2: Subtract the mean % XAdjust = X-repmat(mean(X),size(X,1),1) m = mean(X); p1 = X(:,1)-m(1,1); p2 = X(:,2)-m(1,2); XAdjust = [p1 p2]; %% Step 3: Calculate the covariance matrix CM = cov(X); %% Step 4: Calculate the eigenvektros and eigenvalues of the covaricance matrix [V D] = eig(CM); %% Step 5: Choosing components and forming a feature vector pause plot(XAdjust(:,1),XAdjust(:,2),'r+'); pause A = 10*V; plot([-A(1,1) A(1,1)],[-A(1,2) A(1,2)],'b-.') plot([-A(2,1) A(2,1)],[-A(2,2) A(2,2)],'b-.') pause % close all figure axis equal axis([-2 2 -2 2]) hold on plot(XAdjust(:,1),XAdjust(:,2),'r+'); plot([-A(1,1) A(1,1)],[-A(1,2) A(1,2)],'b-.'); plot([-A(2,1) A(2,1)],[-A(2,2) A(2,2)],'b-.'); plot([0 0],[-2,2],'k:'); plot([-2 2],[0,0],'k:'); title('Mean adjusted data with eigenvectors overlayed'); %% Step 6: Deriving the new data set pause % f1 = V(:,2)' % V(:,find(D==max(D))); % maksimum varyansı bulmak için kullanılabilir. f1 = [-0.677873399 -0.735178656]; PC1 = f1*XAdjust' PC1 = PC1' f2 =[-0.73518656 0.677873399]; %f2 = V(:,1)' PC2 = f2*XAdjust'; PC2 = PC2' F = [f1; f2]; Y = [PC1 PC2] figure plot(Y(:,1),Y(:,2),'r+') axis equal axis([-2 2 -2 2]) Cy = cov(Y); [Vy Dy] = eig(Cy); hold on A = 10 * Vy'; plot([-A(1,1) A(1,1)],[-A(1,2) A(1,2)],'g:'); plot([-A(2,1) A(2,1)],[-A(2,2) A(2,2)],'b:'); title('Data transformed with 2 eigenvectors'); pause; % Variance figure; hold off bar(diag(D)); xlabel('Projection dimension'); ylabel('Variance');
Comments
Post a Comment