Ex8 - Informatik

CS331: Machine Learning
Prof. Dr. Volker Roth
volker.roth@unibas.ch
FS 2015
Aleksander Wieczorek
aleksander.wieczorek@unibas.ch
Dept. of Mathematics and Computer Science
Spiegelgasse 1
4051 Basel
Date: Monday, May 4th 2015
Exercise 11: Kernel ridge regression
Recap & Definitions
• Ridge regression:
– The problem is formulated as:
p
p
n
X
X
X
(yi − β0 −
βj gj (xi ))2 + ν
βj2
βˆridge = argminβ
i=1
j=1
!
,
j=1
where ν ∈ [0, ∞) is a penalty parameter.
– In matrix form:
βˆridge = argminβ (y − Xβ)t (y − Xβ) + νβ t β .
– The solution is given by:
βˆridge (X t X − νI) = X t y
and using the SVD decomposition of X we have:
βˆridge = V (S t S + νI)−1 S t U t y
• Kernel ridge regression:
– Expand β in terms of input vectors: β = X t α, α ∈ Rp+1 .
– The problem is then reformulated as:
α
ˆ ridge = argminα (y − XX t α)t (y − XX t α) + ναt XX t α .
– Substitute the dot product matrix XX t by an arbitrary Mercer-kernel matrix K:
α
ˆ ridge = argminα (y − Kα)t (y − Kα) + ναt Kα .
– Setting the derivative ∂/∂α to zero gives:
2K t (Kα − y + να) = 0
and the solution is given by:
α
ˆ ridge (K + νI) = y
• Predictions based on a linear combination of a set of radial basis functions:
1
gi (x) = K(x, xi ) = exp − k x − xi k2 , xi , x ∈ Rd ,
λ
and λ ∈ (0, ∞) is the smoothing parameter.
1
CS331: Machine Learning
FS 2015
Exercise
Write a Matlab function for kernel ridge regression with RBF and x ∈ R. Allow an intercept in
the model, i.e. add 1 to each element of the kernel matrix. With the optimal α
ˆ ridge calculated
on training data X make predictions for new test data x? : yˆ? = K(x? , x)α
ˆ ridge and compute
the test set error. (Generate data as in exercise sheet 5 and compare.)
function [err,model,errT] = KridgeRFB(x,y,lambda,nu,xT,yT)
x = vector of input scalars for training
y = vector of output scalars for training
lambda = RBF width parameter (>0)
nu = ridge penalty (>=0)
xT = vector of input scalars for testing
yT = vector of output scalars for testing
err = average squared loss on training
alphaHat = vector of parameters
errT = average squared loss on testing
2