Machine Learning Day
Lab 2.A: Regularized Least Squares (RLS)
This lab is about linear Regularized Least Squares for classification or regression.
Getting Started
- Get the code file, add the directory to MATLAB path (or set it as current/working directory).
- Use the editor to write/save and run/debug longer scripts and functions.
- Use the command window to try/test commands, view variables and see the use of functions.
- Use
plot
(for 1D),imshow
,imagesc
(for 2D matrices),scatter
,scatter3D
to visualize variables of different types. - Work your way through the examples below, by following the instructions.
1. Data Generation
Start by generating data from a mixture of Gaussians distribution using the provided MixGauss
.
- Generate a 2-dimensional, 2-class training set
(Xtr, Ytr)
, with classes centered at (-0.5,-0.5) and (0.5,0.5) and variance 0.5 for both (5 points per class). Adjust the output labelsYtr
to be {1,-1}, e.g. usingYtr(Ytr==2)=-1
- Generate a corresponding test set of 200 points per class
(Xte,Yte)
from the same distribution. - Add noise to the generated data by randomly flipping a percentage of the point labels (e.g. 10%), using the provided function
flipLabels
. You will obtain a new set of trainingYtrn
and testYten
label/output vectors. Plot the various datasets usingscatter
, e.g.:
figure; hold on;
scatter(Xtr(Ytr==1,1), Xtr(Ytr==1,2), '.r');
scatter(Xtr(Ytr==-1,1), Xtr(Ytr==-1,2), '.b');
2. Linear RLS
- Complete the code in functions
regularizedLSTrain
andregularizedLSTest
for training and testing a regularized Least Squares classifier. - Try the functions on the previously generated 2-dimensional, 2-class data from 1.1-1.3. Pick a "reasonable" lambda and check the effect of
regularization and the effect of noise. Plot the data in a way that visualizes the obtained results (e.g. a scatter plot with the misclassified points labeled differently; similar to Lab 1) and evaluate the classification performance by comparing the estimated to the true outputs.
Note: To visualize the separating function (and thus visualize what areas of the 2D plane are associated with each class) you can use the functionseparatingFRLS
. Superimpose the training and test set data(Xtr, Ytr)
and(Xts, Yts)
, on separate plots, to analyze the generalization properties of the solution. - Perform parameter selection using hold-out cross-validation to select lambda in the range {exp(-10), ...exp(0)}, using the provided
holdoutCVRLS
. Plot the training and validation errors for the different values of lambda; apply the best model to the test set (Xte) and check the classification error; show the separating function and generalization of the solution. - Repeat the procedure data generation -- parameter selection -- test multiple times and compare the test error of RLS with that of ordinary least squares (OLS), i.e. with lambda=0. Does regularization improve classification performance?
- (Optional) Repeat the classification 2.2 for the high-dimensional dataset. Generate the same classes as in Section 1 with the Gaussians now residing in a d-dimensional space. How would you choose the class mean vectors? Try as indicative values d = 10, p~0.1, N~10. Check what happens with: varying lambda, varying the input space dimension d (i.e., the effect of "distance" between points), varying the effect of noise. Perform parameter selection using hold-out cross-validation for lambda in a reasonable range (using
holdoutCVRLS
) and find the generalization error of the best model.
3. (Optional)
- Modify the
regularizedLSTrain
andregularizedLSTest
functions to incorporate an offset in the linear model (i.e., y = <w,x> + b). Compare the solution with and without offset, in a 2-class dataset where classes are centered on (0,0) and (1,1) with variance 0.35 each. - Repeat 2.2 and 2.4 for different configurations (change training set size, mean vectors position/variance, percentage of noise).
- Modify the
regularizedLSTrain
andregularizedLSTest
functions to handle multiclass problems.