CBMM Summer School, Day 2: Machine Learning

 Lab 3: Dimensionality reduction and feature selection


In this lab we will address the problem of data analysis with a reference to a classification problem. Get the zip file and follow the instructions below. Think hard before you call the instructors! 


1. Warm up - Data Generation

Generate a 2-class dataset of D-dimensional points with N points for each class. Start with N = 100 and D = 30 and create a train and a test set. 

scatter(X2tr(:,1), X2tr(:,2), 25, Ytr);
scatter(X2ts(:,1), X2ts(:,2), 25, Yts);
Xtr_noise = sigma_noise*randn(2*N, D-2);
Xts_noise = sigma_noise*randn(2*N, D-2);

The final train and test data matrices will be composed as: 
Xtr = [X2tr Xtr_noise];
Xts = [X2ts Xts_noise];

2. Principal Component Analysis


3. Variable selection

How does the train and test error change with the number of iterations of the method? Plot the errors on the same plot for increasing T.
[it, Vm, Vs, Tm, Ts] = holdoutCVOMP(Xtrn, Ytr, perc, nrip, intIter);
Plot the training and validation error on the same plot. What is the behavior of the training and the validation errors with respect to the number of iterations?
figure;
plot(intIter, Tm, 'r'); hold on;
plot(intIter, Vm, 'b'); hold off;

4. (Optional) - Additional experiments