Introduction using GPy¶

In this section, we will take a look at the basic GPy package and attempt to use the library to perform Regression. We first import the necessary packages. Note that we change the default plotting engine to plotly so that we get an interactive experience.

import GPy
GPy.plotting.change_plotting_library('plotly')
import numpy as np
from IPython.display import display

RBF Kernel¶

We now experiment with 1 dimentional models. We will first investigate the effects of RBF kernels. We will declare some convenient functions to help us define our experiments. RBF Kernels are also known as squared exponential kernels. They are the de-factor kernel for GPs and SVM, due to its universal and robust nature. The main reason for this behavior is its infinite dimentional mapping to some infinite dimentional space.

The kernel for RBF is given:

$$ K(x,x') = \sigma^2 \exp\Big(-\frac{(x-x')^2}{2l^2}\Big) $$

where the following are the parameters:

$l$ lengthscale, determining the length of the 'wiggles' in the function. The effect of this is that we cannot extrapolate further than $l$ away from the data. We also note that this can be understood as a form of similarity between vectors. Some RBF kernels use $\gamma=\frac{1}{2l^2}$.
$\sigma^2$ output variance determines the average distance of the function from the mean, in short it is the scale.

There are no hyperparameters in RBF kernel.

Infinite-dimentional mapping¶

The RBF kernel is infinite-dimentional, we can simply expand the kernel. Without loss of generality, we can simply the kernel to $K(x,x') = \exp(-(x-x')^2)$. Next we simply expand the kernel using Taylor series of $e^x$, at $x=0$.

$$ \begin{aligned} \exp(-(x-x')^2) &= \exp(-x^2)\times \exp(-{x'}^2)\times \exp(2xx')\\ &= \exp(-x^2) \exp(-{x'}^2) \Bigg[1+2xx' + \frac{2^2{x'}^2x^2}{2!} + \cdots \Bigg] \\ &= \exp(-x^2) \exp(-{x'}^2) \sum^{\infty}_{k=0}\frac{2^k{x'}^kx^k}{k!}\\ &= \sum^{\infty}_{k=0}\Bigg[\sqrt{\frac{2^{k}}{k!}}\exp(-x^2)x^k \times \sqrt{\frac{2^{k}}{k!}}\exp(-{x'}^2){x'}^k\Bigg] \\ \end{aligned} $$

We observe that it is taking a dot product over an infinite dimentions, in other words, the RBF kernel is formed by taking an infinite sum over polynomial kernels. This explains the power behind RBF kernels. RBF kernels map the current vector into an infinite dimentional space and compute the distance between each vector pairing. We also note that even though this is an infinite dimentional space, each variable is highly constrainted, so the solution space is not exactly totally unbounded.

Experiment¶

We aim to look at the behavior of RBF kernels and investigate the effect of tweaking some of the parameters. We start by declaring some helper functions. We will use $f(x) = \sin(x)$ and gaussian noise in our experiments.

f = lambda x: np.sin(x)
g_noise = lambda sigma,n : np.random.normal(0, sigma, (n,1))
r_space = lambda l,h,n : np.random.uniform(l,h,(n,1))

def experiment(n, lower, upper, noise_sigma, f=f):
    X = np.random.uniform(lower, upper, (n,1))
    y = f(X) + g_noise(noise_sigma, n)
    return X, y

We first experiment by allowing the GP to learn from the function $f$ with values generated from the range $[-2\pi, 2\pi]$. The $\sigma$ for the gaussian noise is set at $0.01$ which is quite small, relative to the $f(x)$ values.

X,y = experiment(100, (-2)*np.pi, (2)*np.pi, 0.01)
m = GPy.models.GPRegression(X,y, GPy.kern.RBF(input_dim=1))
display(m)
GPy.plotting.show(m.plot(), filename='RBF_001')

This is the format of your plot grid:
[ (1,1) x1,y1 ]

m.optimize_restarts(num_restarts = 10)
display(m)
GPy.plotting.show(m.plot(), filename='RBF_001_Trained')

Optimization restart 1/10, f = -266.96256777761266
Optimization restart 2/10, f = -266.9625677803092
Optimization restart 3/10, f = -266.9625677800516
Optimization restart 4/10, f = -266.9625677805106
Optimization restart 5/10, f = -266.9625677796965
Optimization restart 6/10, f = -266.96256777840233
Optimization restart 7/10, f = -266.9625677803099
Optimization restart 8/10, f = -266.96256777225017
Optimization restart 9/10, f = -266.96256778031307
Optimization restart 10/10, f = -266.9625677797752

This is the format of your plot grid:
[ (1,1) x1,y1 ]

We observe that the RBF kernel managed to fit the curve well despite the noise. We also observe that the model learnt the noise with the same variance that we had set it to be $0.01 ^ 2 = 0.0001$. We observe that the kernel did not learn the true function, as we observe that values that are on the curve at about $x=8$ is errornous, going up to about $1.5$, whereas we know that the function $f(x)$ can never produce $1.5$. However, as the lengthscale goes, the interpretation is fairly accurate, we cannot extrapolate much further than $2.4$. We now try the same experiment with more noise, at $0.1$.

X,y = experiment(100, (-2)*np.pi, (2)*np.pi, 0.1)
m = GPy.models.GPRegression(X,y, GPy.kern.RBF(input_dim=1))
display(m)
GPy.plotting.show(m.plot(), filename='RBF_01')

This is the format of your plot grid:
[ (1,1) x1,y1 ]

m.optimize_restarts(num_restarts = 10)
display(m)
GPy.plotting.show(m.plot(), filename='RBF_01_Trained')

Optimization restart 1/10, f = -55.988787711443614
Optimization restart 2/10, f = -55.98878771144403
Optimization restart 3/10, f = -55.9887877106447
Optimization restart 4/10, f = -55.98878771047351
Optimization restart 5/10, f = -55.98878771112895
Optimization restart 6/10, f = -55.98878771072629
Optimization restart 7/10, f = -55.98878771091473
Optimization restart 8/10, f = -55.988787711443884
Optimization restart 9/10, f = -55.988787710986465
Optimization restart 10/10, f = -55.98878771139341

This is the format of your plot grid:
[ (1,1) x1,y1 ]

We observe that the GP still does a good job fitting the data within the range of data, but the lengthscale is reduced drastically. Moreover we observe that the confidence bounds are wider now. It covers the range of which the values are at. We had dried a smooth function and now we will try a non-smooth function $f(x)=|x|$

X,y = experiment(100, -5, 5, 0.1, lambda x:np.abs(x))
m = GPy.models.GPRegression(X,y, GPy.kern.RBF(input_dim=1))
display(m)
GPy.plotting.show(m.plot(), filename='RBF_ABS_01')

This is the format of your plot grid:
[ (1,1) x1,y1 ]

m.optimize_restarts(num_restarts = 10)
display(m)
GPy.plotting.show(m.plot(), filename='RBF_ABS_01_Trained')

Optimization restart 1/10, f = -49.6628368166535
Optimization restart 2/10, f = -49.66283681662521
Optimization restart 3/10, f = -49.662836816652764
Optimization restart 4/10, f = -49.662836816655926
Optimization restart 5/10, f = -49.662836816584374
Optimization restart 6/10, f = -49.662836816650795
Optimization restart 7/10, f = -49.66283681659555
Optimization restart 8/10, f = -49.66283681665598
Optimization restart 9/10, f = -49.66283680842721
Optimization restart 10/10, f = -49.66283681638681

This is the format of your plot grid:
[ (1,1) x1,y1 ]

We see that the RBF kernel cannot generalize well for the absolute function. It performs poorly extrapolating the values beyond the range of the inputs. In short, RBF kernels and its derivatives work very well for interpolation of smooth function, but it may not generalize well beyond the lengthscale.

GP_regression.	value	constraints
rbf.variance	1.0	+ve
rbf.lengthscale	1.0	+ve
Gaussian_noise.variance	1.0	+ve

GP_regression.	value	constraints
rbf.variance	3.705882406391364	+ve
rbf.lengthscale	2.6188773771796034	+ve
Gaussian_noise.variance	0.00010338689284352924	+ve

GP_regression.	value	constraints
rbf.variance	1.0	+ve
rbf.lengthscale	1.0	+ve
Gaussian_noise.variance	1.0	+ve

GP_regression.	value	constraints
rbf.variance	1.4965927427257437	+ve
rbf.lengthscale	2.0611808511885124	+ve
Gaussian_noise.variance	0.011151693842591424	+ve

GP_regression.	value	constraints
rbf.variance	1.0	+ve
rbf.lengthscale	1.0	+ve
Gaussian_noise.variance	1.0	+ve

GP_regression.	value	constraints
rbf.variance	7.888318502546391	+ve
rbf.lengthscale	1.9225164300398427	+ve
Gaussian_noise.variance	0.01149434553171313	+ve