Lab 2.B: Kernel Regularized Least Squares

Machine Learning Day

Lab 2.B: Kernel Regularized Least Squares (KRLS)

This lab is about Regularized Least Squares through the kernel formulation and the use of nonlinear kernels.

Code/data

Getting Started

Get the code, unzip and add the directory to MATLAB path (or set it as current/working directory).
Work your way through the examples below, by following the instructions.

1. Kernel Regularized Least Squares

Load the "two moons" dataset (found in data directory) by typing load('moons_dataset.mat') and visualize the training and the test set with the scatter function like you did for the datasets from MixGauss.
Complete the code of the functions regularizedKernLSTrain and regularizedKernLSTest to perform kernel-based RLS. Study and use the KernelMatrix function for computing linear, gaussian and polynomial kernels
Use a linear kernel (kernel='linear') and check the resulting separating function on the training set (use separatingFKernRLS).
Use a Gaussian kernel (kernel='gaussian') and try a range of kernel and regularization parameters (e.g., sigma in [0.001, 10], lambda in [0, 10]). Check how the separating function changes with respect to the parameters.
(Optional)Repeat 1.4 by adding noise (flipping a percentage of labels usingflipLabels with p equal to {0.05, 0.1}. How does the separating function and errors on the test set change with respect to lambda and why?

2. Parameter Selection

Select the suitable lambda using HoldoutCVKernRLS and the indicative values
intSigma = 0.5;
intLambda = exp(-20:5);
nrip = 51;
Plot the validation, train and (optionally) test error with lambda on a logarithmic x-axis scale (use semilogx(intLambda, validation_mean_error) or set(gca, 'xscale', 'log') with regular plot(...)). Repeat the experiment by adding noise with p = {0.05, 0.1}. How is the plot of the errors with respect to lambda and why? (Optional): A rule of thumb for choosing a single 'reasonable' sigma is to compute the average distance between close points in the training set. See if you can apply this rule using concepts from kNN (see the provided function autosigma). Given a fixed sigma (e.g. the one from autosigma), how can you choose a reasonable interval intLambda to search for "good" lambdas?
Repeat 2.1 and 2.2, this time selecting a suitable sigma:
intSigma = exp(-10:4);
intLambda = 0.00001;
nrip = 51;
Check the effects of overfitting and oversmoothing by selecting, instead, a small or a larger sigma (e.g., sigma=0.05 or sigma=10). How can you choose a reasonable interval intSigma to search for "good" sigmas?
Select the best lambda and sigma simultaneously and plot the separating function for the KRLS solution obtained using those values (use separatingFKernRLS). Use the above grids for the sigma and lambda ranges and a smaller number of repetitions (e.g nrip = 11). Note: Train and validation errors are now surfaces over the lambda and sigma grids. For plotting them you can use MATLAB functions surf or mesh. Compare the differences in values (and curves) for noise with p equal to {0.05, 0.1}.

3. (Optional)

Repeat Part 1 with the polynomial kernel (kernel = 'polynomial') and with parameters lambda in the interval [10, 0] and deg, the exponent of the polynomial kernel, in {10,9,...,1}.
Repeat Part 2.4 using a polynomial kernel and a suitable range of parameters (exponent of the polynomial and regularization). Compare the classification error with Part 2.3 and think about the role of the optimal exponent for the polynomial kernel.
Analyze the eigenvalues of the Gram matrix for the polynomial kernel (e.g. use the eig function) for different values of deg (plot them by using semilogy). What happens as deg increases (e.g. deg = 1,2,...,10)? Why?
Repeat Part 2, with less training points (e.g. 70, 50, 30, 20) chosen randomly and 5% of flipped labels. How do the parameters vary with respect to the number of points?