Machine Learning Day
Lab 0: Data Generation
This lab is about getting started with MATLAB and working with data. The goal is to provide familiarity with MATLAB syntax, along with some preliminary data generation, processing and visualization.
Getting Started
- Get the code file, add the directory to MATLAB path (or set it as current/working directory).
- Use the editor to write/save and run/debug longer scripts and functions.
- Use the command window to try/test commands, view variables and see the use of functions.
- Use
plot
(for 1D),imshow
,imagesc
(for 2D matrices),scatter
,scatter3D
to visualize variables of different types. - Work your way through the examples below, by following the instructions.
MATLAB/Octave resources
- MATLAB getting started tutorial for an introduction to the environment, syntax and conventions.
- MATLAB has very thorough documentation, both online and built in. In the command window, type:
help functionName
(check use) ordoc functionName
(pull up documentation). - Built in tutorials: in the command window enter:
playbackdemo('GettingStartedwithMATLAB','toolbox/matlab/demos/html')
- Comprehensive MATLAB reference and introduction: (pdf)
- MATLAB Tutorials and Learning Resources. Note the section with university-authored tutorials.
- Writing Fast MATLAB Code (pdf): Profiling, JIT, vectorization, etc.
- Stack Overflow: MATLAB tutorial for programmers
- MIT Open CourseWare: Introduction to MATLAB
- Octave Programming Tutorial (Wikibook)
- Stanford/Coursera Octave Tutorial (video)
1. MATLAB warm-up
- Create a column vector
v = [1; 2; 3]
and a row vectoru = [1,2,3]
- What happens with the command
v'
? What is the corresponding algebraic/matrix operation for this? - Create
z = [5;4;3]
and try basic numerical operations of addition and subtraction. - What happens with u + z?
- What happens with the command
- Create the matrices
A = [1 2 3; 4 5 6; 7 8 9]
andB = A'
- What kind of matrix is
C = A + B
? - Explore what happens with
A(:,1)
,A(1,:)
,A(2:3,:)
andA(:)
.
- What kind of matrix is
- Use the product operator
*
- What happens with
2*u
,u*2
,2*v
? - What happens with
u*v
andv*u
, why? WithA*v
,u*A
andA*u
? - Use
size
and/orlength
functions to find the dimensions of vectors and matrices.
- What happens with
- Use the element-wise operators
.*
and./
, e.g.,u.*z
andz./u
- What happens with
v.*z
andv./z
? - Why aren't
A*A
andA.*A
the same?
- What happens with
- Use the functions
zeros
,ones
,rand
,randn
- Create a 3 x 5 matrix of all zeros, all ones or random numbers uniformly distributed between 2 and 3 and random numbers distributed according to a Gaussian of variance 2.
- Use the functions
eye
anddiag
- create a 3 x 3 identity matrix and a matrix whose diagonal is the vector
v
.
- create a 3 x 3 identity matrix and a matrix whose diagonal is the vector
2. Core - Data generation
- The function
MixGauss(means, sigmas, n)
generates datasets where the distribution of each class is an isotropic Gaussian with a given mean and variance, according to the values in matrices/vectorsmeans
andsigmas
. Study the function code or typehelp MixGauss
on the MATLAB shell. - Generate a simple dataset through the following commands:
[X1, Y1] = MixGauss([[0;0], [1;1]], [0.5, 0.25], 1000);
figure(1); scatter(X1(:,1), X1(:,2), 25, Y1);
- Generate a more complex dataset:
- Call
MixGauss
with appropriate parameters and produce a dataset with four classes: the classes must live in the 2D space and be centered on the corners of the unit square (0,0), (0,1), (1,1), (1,0), all with variance 0.2. Usescatter
to plot the points. - Manipulate the data to obtain a 2-class problem where data on opposite corners share the same class. Hint: if you generated the data
following the suggested center order, you can use the function
mod
to quickly obtain two labels, e.g.Y = mod(C, 2).
- Call
3. Optional - Extra practice
- Generate more complex datasets using
MixGauss
, for instance by choosing larger variances, higher dimensionality of input space etc. - Add noise to the data by randomly flipping the labels on the training set (in this case the error rate is the percentage of flipped labels.
- Given a data set compute the distances among all input points (use vectorization in your code, avoid using a "for" loop): how does the mean distance change with the number of dimensions?
- Generate regression data considering a regression model defined by a linear function with Gaussian noise. Create
a MATLAB function with input the number of points
n
, the number of dimensionsD
, a D-dimensional coefficients vectorw
and a noise leveldelta
. The output should be an (n x D) matrixX
and an (n x 1) vectorY
. You might want to test/visualize 1-D and 2-D cases, but make the function generic to account for higher dimensional data. Plot the underlying (linear) function and the noisy output on the same figure. - Generate regression data for a 1-D regression model defined by a non linear function.
- Generate a data-set (either for regression or for classification) where most of the input variables are "noise", in the sense that they are unrelated to the output).