Lab 2.B: Kernel Regularized Least Squares

Machine Learning Day

Lab 2.B: Kernel Regularized Least Squares (KRLS)

This lab is about Regularized Least Squares under the kernel formulation, the use of nonlinear kernels and the classification of nonlinearly separable datasets. This is the second part of the RLS lab.

Code/data

Getting started

Get the code file, add the directory to MATLAB path (or set it as current/working directory).
Use the editor to write/save and run/debug longer scripts and functions.
Use the command window to try/test commands, view variables and see the use of functions.
Use plot (for 1D), imshow, imagesc (for 2D matrices), scatter, scatter3D to visualize variables of different types.
Work your way through the examples below, by following the instructions.

1. Kernel Regularized Least Squares

Complete the code of functions regularizedKernLSTrain and regularizedKernLSTest that perform training and testing using kernel RLS.

Data: load the "two moons" dataset by typing load('data/moons_dataset.mat') and visualize the training and the test set using scatter.
Study and try the KernelMatrix function for computing linear, Gaussian and polynomial kernels from the data.
Use a linear kernel (kernel='linear') and check the resulting separating function on the training set (use separatingFKernRLS).
Use a Gaussian kernel (kernel='gaussian') and try a range of kernel and regularization parameters (e.g., sigma in [0.01, 5], lambda in [0, 1]). Check how the separating function changes.
(Optional) Repeat 1.4 by adding noise, usingflipLabels, with p in [0.05, 0.1]. How does the separating function and errors on the test set change with lambda and why?

2. Parameter selection

Apply hold-out cross validation (using the provided HoldoutCVKernRLS) for selecting the regularization and Gaussian kernel parameters (lambda, sigma). Indicative values for the hold-out percentage and the number of repetitions are pho = 0.2, rep=51 respectively.

Fix sigma and select lambda from a logarithmic range of values, e.g. between 1e-5 and the maximum eigenvalue of the kernel matrix of the training set. For example set intSigma = 0.5; intLambda = logspace(-5, ...).
Plot the validation and train (and optionally test) error with lambda on a logarithmic x-axis scale (use semilogxor set(gca, 'xscale', 'log') with regular plot).
A rule of thumb for choosing a single 'reasonable' sigma is to compute the average distance between neighboring points in the training set. Apply this rule using concepts from kNN, using the provided function autosigma and compare the performance on the test set with that of Part 1.
Repeat cross-validation for a noisy set, e.g. with p in [0.05, 0.1].
Fix lambda to a small value, e.g. intLambda = 0.001; and repeat 2.1 and 2.2 to select a suitable sigma using a search range, e.g. [0.01, 3]. Check the effects of overfitting and oversmoothing by selecting, instead, a small or a large sigma (e.g., sigma=0.01 or sigma=5). How would you choose a reasonable set of values intSigma to search for "good" sigmas?
Select a good lambda and sigma simultaneously and plot the separating function for the KRLS solution obtained using those values (use separatingFKernRLS). For efficiency, use the grids for sigma and lambda in Part 1 and Part 5 and a smaller number of repetitions, e.g rep = 11. Compare the performance for a noisy set as in Part 3. Note: Train and validation errors are now surfaces over the lambda and sigma grids, which you can view using surf or mesh.

3. (Optional)

Repeat Section 2.6 by subsampling the training set at random (e.g. 70, 50, 30, 20) and p=0.05 noise. Study how the parameters vary with the size of the training set.
Repeat Section 1 with the polynomial kernel (kernel = 'polynomial') and parameters lambda in [0,10] and deg, the exponent of the polynomial in [2,10].
Apply parameter selection (like in Section 2.6) with a polynomial kernel and a suitable range of exponents and regularization parameters. Compare with the error and think about the role of the optimal exponent for the kernel.
Analyze the eigenvalues of the matrix for the polynomial kernel (use eig) for different values of degby plotting them using semilogy. What happens as deg increases? Why?