In this lab we start using the basic functionalities of the GURLS Library (Grand Unified Regularized Least Squares). To test what you learned, we pose you a Machine Learning challenge at the end of the lab (deadline 6 p.m.). The challenge as well as this laboratory is not mandatory for MLCC.
Download the material for this lab:
•.zipfile (unzip it in a local folder)
1.A Download and unzip the Gurls zip, or pull it from github, in a local folder GURLS. Now run the following code to install it
run('GURLS/gurls/utils/gurls_install.m');
1.B Generate a 2-class training set by using the following code as usual:
[Xtr, Ytr] = MixGauss([[0;0],[1;1]],[0.6,0.4],100);
[Xts, Yts] = MixGauss([[0;0],[1;1]],[0.6,0.4],100);
Ytr(Ytr==2) = -1;
Yts(Yts==2) = -1;
Now visualize the data with
figure;
scatter(Xtr(:,1), Xtr(:,2), 50, Ytr, 'filled')
figure;
scatter(Xts(:,1), Xts(:,2), 50, Yts, 'filled')
2.A Type the following code and analyze the output
model = gurls_train(Xtr, Ytr);
ypred = gurls_test(model, Xts);
GURLS detects automatically that the problem is a classification problem with two classes labeled with 1 and -1. Gurls chose to train a Kernel Regularized least square algorithm whose weights are stored in model. ypred contains the predicted labels. In order to compute both the predicted labels and the accuracy on the test, type
[ypred, accuracy] = gurls_test(model, Xts, Yts);
2.B Visualize the learned function by using
figure;
scatter(Xts(:,1), Xts(:,2), 50, Yts, 'filled')
hold on
separatingGurls(Xtr, model);
2.C Perform the same experiment by using flipped labels (Ytrn = flipLabels(Ytr, p)) with p e.g. 0.05.
In this exercise you familiarize with the GURLS options. Check if you can obtain a better solution to exercise 2. Check if your alternative strategy is outperforming the automatic gurls solution on other instances of ex.2 (e.g. by changing the centers or the variance of the gaussian and so on).
3.A Perform the exercise 2 by specifying (for reference see 'GURLS/gurls/gurls-train-test.pdf')
•a range for the regularization parameter
•a range for the kernel parameter (e.g. [0.1, 0.2, 0.5, 1, 2, 5]).
rk = [0.1, 0.2, 0.5, 1, 2, 5];
model = gurls_train(Xtr, Ytr, 'kerrange', rk);
•by specifying both ranges
rr = [0.0001, 0.001, 0.01, 0.1, 0.2, 0.5, 1];
rk = [0.1, 0.2, 0.5, 1, 2, 5];
model = gurls_train(Xtr, Ytr, 'regrange', rr, 'kerrange', rk);
•by fixing a parameter by hand e.g.
model = gurls_train(Xtr, Ytr, 'kerpar', 0.5, 'regpar', 0.1);
3.B Perform exercise 2 by using different kernels (modify the option 'kernel' and try 'linear' and 'chisquared')
3.C The computational methods for minimizing the regularized empirical risk are called filters. In GURLS are implemented different filters other than the Tikhonov one. In this point we perform the exercise 2 with some iterative filter:
•Landweber with iterations in between 0 and 100.
T = 100;
model = gurls_train(Xtr, Ytr, 'filter', 'land', 'regrange', T);
•Nu-Method
T = 100;
gurls_train(Xtr, Ytr, 'filter', 'nu', 'regrange', T);
3.D Perform the exercise 2 with different kernels (try 'linear' and 'chisquared') e.g.
gurls_train(Xtr, Ytr, 'kernel', 'linear');
Let's get serious! We pose you a challenge: the problem consists in discriminating images depicting deers from images depicting horses. The goal is to find the best classifier.
The file Challenge_Train.mat contains the training set and is constituted by two matrices: Xtr and Ytr. Each row of Xtr is a vectorized 32x32 pixel RGB image from the Cifar-10 dataset therefore it is a vector of 3072 elements whose first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. Ytr contains the labels.
Have a look to the following code for an example on how to prepare the data for the challenge
load('Challenge_Train.mat');
figure; visualizeExample(Xtr(11,:));
figure; visualizeExample(Xtr(3901,:));
n = size(Xtr,1);
I = randperm(n);
perc = 0.7;
%randomly split the dataset in training and validation
X_train = Xtr( I(1:floor(perc*n)),:);
X_val = Xtr( I(floor(perc*n)+1:end),:);
Y_train = Ytr( I(1:floor(perc*n)));
Y_val = Ytr( I(floor(perc*n)+1:end));
%train the model on the training set
model = gurls_train(X_train, Y_train);
%test the model on the validation set
[ypred, accuracy] = gurls_test(X_val, Y_val);
When you find a model you are pretty confident in, save it by using save('FirstName_LastName.mat', 'model') and submit the saved file on http://www.dropitto.me/alex86r
with password sendit.
We will test the model on our test set and compute the classification accuracy. It will be your score for the challenge (the higher the better).