Machine Learning Day
Lab 2.B: Kernel Regularized Least Squares (KRLS)
This lab is about Regularized Least Squares under the kernel formulation, the use of nonlinear kernels and the classification of nonlinearly separable datasets. This is the second part of the RLS lab.
Getting started
- Get the code file, add the directory to MATLAB path (or set it as current/working directory).
- Use the editor to write/save and run/debug longer scripts and functions.
- Use the command window to try/test commands, view variables and see the use of functions.
- Use
plot
(for 1D),imshow
,imagesc
(for 2D matrices),scatter
,scatter3D
to visualize variables of different types. - Work your way through the examples below, by following the instructions.
1. Kernel Regularized Least Squares
Complete the code of functions regularizedKernLSTrain
and regularizedKernLSTest
that perform training and testing using kernel RLS.
- Data: load the "two moons" dataset by typing
load('data/moons_dataset.mat')
and visualize the training and the test set usingscatter
. - Study and try the
KernelMatrix
function for computing linear, Gaussian and polynomial kernels from the data. - Use a linear kernel (
kernel='linear'
) and check the resulting separating function on the training set (useseparatingFKernRLS
). - Use a Gaussian kernel (
kernel='gaussian'
) and try a range of kernel and regularization parameters (e.g., sigma in [0.01, 5], lambda in [0, 1]). Check how the separating function changes. - (Optional) Repeat 1.4 by adding noise, using
flipLabels
, with p in [0.05, 0.1]. How does the separating function and errors on the test set change with lambda and why?
2. Parameter selection
Apply hold-out cross validation (using the provided HoldoutCVKernRLS
) for selecting the regularization and Gaussian kernel parameters (lambda, sigma)
. Indicative values for the hold-out percentage and the number of repetitions are pho = 0.2, rep=51
respectively.
- Fix sigma and select lambda from a logarithmic range of values, e.g. between 1e-5 and the maximum eigenvalue of the kernel matrix of the training set. For example set
intSigma = 0.5; intLambda = logspace(-5, ...)
. - Plot the validation and train (and optionally test) error with lambda on a logarithmic x-axis scale (use
semilogx
orset(gca, 'xscale', 'log')
with regularplot
). - A rule of thumb for choosing a single 'reasonable' sigma is to compute the average distance between neighboring points in the training set. Apply this rule using concepts from kNN, using the provided function
autosigma
and compare the performance on the test set with that of Part 1. - Repeat cross-validation for a noisy set, e.g. with p in [0.05, 0.1].
- Fix lambda to a small value, e.g.
intLambda = 0.001;
and repeat 2.1 and 2.2 to select a suitable sigma using a search range, e.g.[0.01, 3]
. Check the effects of overfitting and oversmoothing by selecting, instead, a small or a large sigma (e.g., sigma=0.01 or sigma=5). How would you choose a reasonable set of valuesintSigma
to search for "good" sigmas? - Select a good lambda and sigma simultaneously and plot the separating function for the KRLS solution obtained using those values (use
separatingFKernRLS
). For efficiency, use the grids for sigma and lambda in Part 1 and Part 5 and a smaller number of repetitions, e.grep = 11
. Compare the performance for a noisy set as in Part 3. Note: Train and validation errors are now surfaces over the lambda and sigma grids, which you can view usingsurf
ormesh
.
3. (Optional)
- Repeat Section 2.6 by subsampling the training set at random (e.g. 70, 50, 30, 20) and
p=0.05
noise. Study how the parameters vary with the size of the training set. - Repeat Section 1 with the polynomial kernel (
kernel = 'polynomial'
) and parameterslambda
in[0,10]
anddeg
, the exponent of the polynomial in[2,10]
. - Apply parameter selection (like in Section 2.6) with a polynomial kernel and a suitable range of exponents and regularization parameters. Compare with the error and think about the role of the optimal exponent for the kernel.
- Analyze the eigenvalues of the matrix for the polynomial kernel (use
eig
) for different values ofdeg
by plotting them usingsemilogy
. What happens as deg increases? Why?