Machine Learning Day
Lab 3: Dimensionality reduction and feature selection
In this lab we will look into the problems of dimensionality reduction through Principal Component Analysis (PCA) and feature selection through Orthogonal Matching Pursuit (OMP).
- Get the code file and add the directory to MATLAB path (or set it as current/working directory).
- Work your way through the examples below, by following the instructions.
1. Data generation
Generate a 2-class dataset of D-dimensional points with N points for each class. Start with N = 100 and D = 30 and create a train and a test set.
- The first two variables (dimensions) for each point will be generated by
MixGauss
, i.e., extracted from two Gaussian distributions, with centroids (1, 1) and (-1,-1) and sigmas 0.7. Adjust the output labels of the classes to be {1,-1} respectively and plot these variables in a 2D plane usingscatter(X(:,1), X(:,2), markerSize, Y);
. - The remaining (D-2) variables will be generated by Gaussian noise with
sigma_noise = 0.01
andX_noise = sigma_noise*randn(2*N, D-2);
. - The final D-dimensional train and test data matrices will then be given by
X = [X X_noise];
, of size N x D, storing N instances in a D-dimensional space.
2. Principal Component Analysis
- Compute the principal components of the training set, using the provided function
PCA
. Select the number of principal components by settingk<D
. - For the data projected on the subspace of K components,
X_p
, plot the first dimension (as points in a line, usingplot(X_proj(:,1), 1)
), the first two (scatter(X_p(:,1), X_p(:,2), markerSize, Ytr);
) and the first three (scatter3(X_p(:,1), X_p(:,2), X_p(:,3), markerSize, Ytr);
). - How would you interpret the meaning of the resulting plots? What is the effective dimensionality of this dataset?
- Visualize the square root of the first k=10 eigenvalues. Can you infer the dimensionality from this distribution?
- Visualize the eigenvectors associated with the largest, e.g.,
scatter(1:D, abs(V(:,1)))
, second largest and third largest eigenvalue. - Repeat the above with a dataset of different
sigma_noise
, e.g., in [0, 2]. To what extent is data visualization by PCA affected by noise?
3. Variable selection with OMP
- Standardize the data matrix, so that each column has mean 0 and standard deviation 1. Use the statistics from the train set X (mean and standard deviation) to standardize the corresponding test set Xte. An example of a vectorized implementation:
m = mean(X);
s = std(X);
X = bsxfun(@rdivide, bsxfun(@minus, X, m, s); - Use the orthogonal matching pursuit algorithm (function
OMatchingPursuit
) using T repetitions, to obtain T-1 coefficients for a sparse approximation of the training set Xt. Plot the resulting coefficients w usingstem(1:D,w)
. What is the output when settingT = 3
and what is the interpretation of the indices of these first active dimensions (coefficients)? - Check the predicted labels on the training (and test) set when approximating the output using w:
Plot the errors on the same plot for increasing T. How do the train and test error change with the number of iterations of the method?Ypred = sign(Xt * w);
err = calcErr(Yt, Ypred); - By applying cross-validation on the training set through the provided
holdoutCVOMP
find the optimum number of iterations in the rangeintIter = 2:D
(indicative values:pho = 0.2, rip = 30
). Plot the training and validation error on the same plot. How do the errors change with the number of iterations?
4. (Optional)
Compare the results of Parts 2 and 3 and evaluate the benefits of the two different methods for dimensionality reduction and feature selection when choosingN >> D
, N ~= D
and N << D
.