Machine Learning Day
Lab 2.B: Kernel Regularized Least Squares (KRLS)
This lab is about Regularized Least Squares through the kernel formulation and the use of nonlinear kernels.
Getting Started
- Get the code, unzip and add the directory to MATLAB path (or set it as current/working directory).
- Work your way through the examples below, by following the instructions.
1. Kernel Regularized Least Squares
- Load the "two moons" dataset (found in data directory) by typing
load('moons_dataset.mat')
and visualize the training and the test set with thescatter
function like you did for the datasets fromMixGauss
. - Complete the code of the functions
regularizedKernLSTrain
andregularizedKernLSTest
to perform kernel-based RLS. Study and use theKernelMatrix
function for computing linear, gaussian and polynomial kernels - Use a linear kernel (
kernel='linear'
) and check the resulting separating function on the training set (useseparatingFKernRLS
). - Use a Gaussian kernel (
kernel='gaussian'
) and try a range of kernel and regularization parameters (e.g., sigma in [0.001, 10], lambda in [0, 10]). Check how the separating function changes with respect to the parameters. - (Optional)Repeat 1.4 by adding noise (flipping a percentage of labels using
flipLabels
with p equal to {0.05, 0.1}. How does the separating function and errors on the test set change with respect to lambda and why?
2. Parameter Selection
- Select the suitable lambda using
HoldoutCVKernRLS
and the indicative values
intSigma = 0.5;
intLambda = exp(-20:5);
nrip = 51;
- Plot the validation, train and (optionally) test error with lambda on a logarithmic x-axis scale (use
semilogx(intLambda, validation_mean_error)
orset(gca, 'xscale', 'log')
with regularplot(...)
). Repeat the experiment by adding noise with p = {0.05, 0.1}. How is the plot of the errors with respect to lambda and why? (Optional): A rule of thumb for choosing a single 'reasonable' sigma is to compute the average distance between close points in the training set. See if you can apply this rule using concepts from kNN (see the provided functionautosigma
). Given a fixed sigma (e.g. the one fromautosigma
), how can you choose a reasonable intervalintLambda
to search for "good" lambdas? - Repeat 2.1 and 2.2, this time selecting a suitable sigma:
intSigma = exp(-10:4);
intLambda = 0.00001;
nrip = 51;
Check the effects of overfitting and oversmoothing by selecting, instead, a small or a larger sigma (e.g., sigma=0.05 or sigma=10). How can you choose a reasonable intervalintSigma
to search for "good" sigmas? - Select the best lambda and sigma simultaneously and plot the separating function for the KRLS solution obtained using those values (use
separatingFKernRLS
). Use the above grids for the sigma and lambda ranges and a smaller number of repetitions (e.gnrip = 11
). Note: Train and validation errors are now surfaces over the lambda and sigma grids. For plotting them you can use MATLAB functionssurf
ormesh
. Compare the differences in values (and curves) for noise with p equal to {0.05, 0.1}.
3. (Optional)
- Repeat Part 1 with the polynomial kernel (
kernel = 'polynomial'
) and with parameters lambda in the interval [10, 0] and deg, the exponent of the polynomial kernel, in {10,9,...,1}. - Repeat Part 2.4 using a polynomial kernel and a suitable range of parameters (exponent of the polynomial and regularization). Compare the classification error with Part 2.3 and think about the role of the optimal exponent for the polynomial kernel.
- Analyze the eigenvalues of the Gram matrix for the polynomial kernel (e.g. use the
eig
function) for different values of deg (plot them by usingsemilogy
). What happens as deg increases (e.g. deg = 1,2,...,10)? Why? - Repeat Part 2, with less training points (e.g. 70, 50, 30, 20) chosen randomly and 5% of flipped labels. How do the parameters vary with respect to the number of points?