Sparsity Preserving Projection, a feature extraction algorithm in Pattern Recognition area
+++
Author : Denglong Pan [email protected]
+++
What is SPP
Refer to https://github.com/lamplampan/SPP/wiki#what-is-spp
SPP, Sparsity Preserving Projection, is an unsupervised dimensionality reduction algorithm. It uses the minimum L1 norm to keep the data in sparse reconstruction.
SPP projections don't affect by the data rotation, scale or offset. SPP can classify the data instinct even though there is no given classified info.
Training sample matrix
Use the weight vector in sparse reconstruction as the coefficient of , to solve the minimum L1 norm problem. Define the equation set [1] below:
Define a sparse refactoring weight matrix below, in which is the optimal solution for equation set [1] :
The weight vector is sparse. Because it contains a lot of classes in the face recognition test samples.
The test samples should be as following:
We can change the equation set [1] to be following equation set [2] taken the residual into consideration, in which the is the residual:
We can define the following objective function [3] in order to find the projection of preserve optimal weight vector
Pass the function above into below one through algebraic transformation, in which the
The eigenvector would be the maximum d eigenvalues in the following resolution.
Step 1 Use the equation set [1] or equation set [2] to calculate the weight matrix S. It can be calculated by the standard linear programming tools such as L1-magic etc.
Step 2 Calculate the projection vector by objective function [3]. Then we can get the d maximum eigenvalues in the subspace and also get the corresponding eigenvectors.
Use PCA + SPP +SRC for the testing.
Why use PCA here
We use PCA here to reduce the dimensions. There are 92112 = 10304 dimensions in each face sample. There are 40 kinds of faces in ORL lib. Each kinds of face contains 10 samples. If we use 5 in each kind of face as training samples, then the constructed matrix is 1030440*5 . It will confront of two problems with so many dimensions:
1 MATLAB will report "OUT OF MEMORY" with so many dimensions matrix.
2 The row number is bigger than column, so that it should be a overdetermined equation. It cannot be solved by L1_MAGIC algorithm.
Test results
Use 5 samples in 40 kinds of face to train. Use the left samples to be tested. Set the residual to be 0.0001 . Set the extracted projected vectors to be 80.
The recognized rate is 93% when the PCA=80 .
+++
How to run the algorithm?
Refer to https://github.com/lamplampan/SPP/wiki#how-to-run
Step 1 : Config your ORL face lib in file orl_src.m . Default path is E:\ORL_face\orlnumtotal\ .
Step 2 : Run orl_src.m