Machine Learning Day
Lab 2.A: Regularized Least Squares (RLS)
This lab is about applying linear Regularized Least Squares (RLS) for classification, exploring the role of the regularization parameter and the generalization error as dependent on the size and the dimensionality of the training set, the noise in the data etc. It also introduces Leave-One-Out Cross-validation (LOOCV), an extreme case of the Hold-out CV which is useful for small training sets.
Getting started
- Get the code file, add the directory to MATLAB path (or set it as current/working directory).
- Use the editor to write/save and run/debug longer scripts and functions.
- Use the command window to try/test commands, view variables and see the use of functions.
- Use
plot
(for 1D),imshow
,imagesc
(for 2D matrices),scatter
,scatter3D
to visualize variables of different types. - Work your way through the examples below, by following the instructions.
1. Classification data generation
- Use
MixGauss
to generate a 2-dimensional, 2-class training set[X, Y]
, with classes centered at (-0.5,-0.5) and (0.5,0.5), variance 0.5 for both and 5 points per class. Adjust the output labelsY
to be {1,-1}, e.g. usingY(Y==2)=-1
. - Generate a corresponding test set 200 points per class
[Xte, Yte]
from the same distribution. - Add noise to the data by randomly flipping a percentage of the point labels (e.g.
p = 0.2
), using the provided functionflipLabels
. You will obtain a new set of trainingY
and testYte
label vectors. - Plot the various datasets using
scatter
, e.g.,scatter(X(:,1), X(:,2), markerSize, Y);
2. RLS classification
Complete the code in functions regularizedLSTrain
and regularizedLSTest
for training and testing a regularized Least Squares classifier. Try the functions on the 2-class problem from Section 1.
- Pick a value for
lambda
, evaluate the classification performance by comparing the estimated to the true outputs and plot the data in a way that visualizes the obtained results (e.g. a scatter plot with the misclassified points labeled differently).
Note: To visualize the separating function ,i.e. the areas of the 2D plane are associated with each class, you can use the functionseparatingFRLS
. Superimpose the training and test set data to analyze the generalization properties of the solution. - Check the effect of regularization by changing lambda and the effect of noise.
- Perform parameter selection using leave-one-out cross-validation, through the provided
looCVRLS
, to selectlambda
from a logarithmic range of values, e.g. between 1e-4 and the maximum eigenvalue of the linear kernel matrixC = X*X'
.- Plot the training and validation errors for the different values of lambda.
- Apply the best model to the test set and check the classification error.
- Show the separating function and generalization of the solution.
- Repeat the procedure data generation -- parameter selection -- test multiple times and compare the test error of RLS with that of ordinary least squares (OLS), i.e. with
lambda=0
. Does regularization improve classification performance?
3. (Optional)
- Classification for a high-dimensional data: Generate the same classes as in Section 1 with the Gaussians now residing in a D-dimensional space, e.g., try
D=10, N=5*D
. How should you choose the class mean vectors?- Check what happens with varying lambda, the input space dimension D (i.e., the distance between points), teh size of the training set and noise.
- Perform parameter selection using leave-one-out or hold-out cross-validation for
lambda
and find the error of the best model. Does regularization help classification performance?
- Modify
regularizedLSTrain
andregularizedLSTest
to incorporate an offset b in the linear model (i.e., y = <w,x> + b). Compare the solution with and without offset, in a 2-class dataset with classes centered at (0,0) and (1,1) with variance 0.35 each. - Modify
regularizedLSTrain
andregularizedLSTest
to handle multiclass problems.