GP Kernels

Generally 2 types of kernels: stationary kernels, non-stationary kernels.

Kernels as we have seen in the eariler articles, are essential part of a GP. They are also losely called covariance functions, which measure similarity between 2 points. They can be defined using any of the mechanics in mathematics, obeying some rules which will be discussed later.

In addition, it should exhibit the property where the kernel will return a larger value when close and smaller value when far. We had intentionally not define close and far as that is the responsibility of the kernel.

A simple kernel can be simply be defined by cosine similarity: $$ k(x,y) = \frac{x\cdot y}{|x||y|} $$

Such a kernel measures distance in eculidean(L2) space. We experiment this with 2 points: $x=[0, 1]$ and $y=[1, 0]$. These 2 points are not similar, so they should get a low value. We expect the value $0$, from computing the cosine equation.

In [1]:
import sys
import os
sys.path.append(os.getcwd() + '/..')
from plotly_config import *

from IPython.display import display, Math, Latex
import GPy
GPy.plotting.change_plotting_library('plotly')
import numpy as np
import chart_studio.plotly as py
import plotly
import plotly.graph_objects as go
import plotly.io as pio
from plotly import tools
from plotly.subplots import make_subplots
import itertools
/home/Eric_Vader/.conda/envs/research/lib/python3.7/site-packages/plotly/graph_objs/_deprecations.py:385: DeprecationWarning:

plotly.graph_objs.Line is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.scatter.Line
  - plotly.graph_objs.layout.shape.Line
  - etc.


In [2]:
kerns = [GPy.kern.Exponential(1), GPy.kern.RBF(1), GPy.kern.Matern32(1), GPy.kern.Matern52(1), 
         GPy.kern.Brownian(1), GPy.kern.Linear(1), GPy.kern.PeriodicExponential(1), GPy.kern.Bias(1), 
         GPy.kern.Cosine(1)]

kernel = lambda x,y: np.dot(x,y)/np.sqrt(np.dot(x,x))/np.sqrt(np.dot(y,y))
kernel(x=[0, 1],y=[1, 0])
Out[2]:
0.0

If instead the 2 points are $x=[0, 1]$ and $y=[0, 1]$, then the output of the kernel should be high, which we expect in our case to be $1$.

In [3]:
kernel(x=[0, 1],y=[0, 1])
Out[3]:
1.0

We emphasise that kernels can output whatever they want based on whatever similarity measure they want to use. We will not constraint the kernels to any certain value that it must exhibit when they are exactly similar.

In this section, we will give an overview of the different kernels used in practice, we list them in the same sequence that we plot the kernels:

  1. Exponential Kernel
  2. RBF Kernel (Matern kernel $\nu=1/2$)
  3. Matern Kernel $\nu=3/2$
  4. Matern Kernel $\nu=5/2$
  5. Brownian Kernel
  6. Linear Kernel
  7. Periodic Exponential Kernel
  8. Bias Kernel
  9. Cosine Kernel

We will compare across the different kernels used and highlight the key differences between them. For the more complicated or noteworthy kernels, we will discuss their results separately in another article. Otherwise, we will describe the kernel in this article.

In [4]:
fig = make_subplots(rows=3, cols=3, subplot_titles=[ k.name for k in kerns ],
                         shared_xaxes=True,
                         vertical_spacing=0.05)
plot_idx = itertools.product([1,2,3], repeat=2)
for k, plot_id in zip(kerns, plot_idx):
    L_space = np.linspace(-5, 5, 500)[:, np.newaxis]
    trace1 = go.Scatter(x=L_space.reshape(-1), y=k.K(L_space,np.array([[1]])).reshape(-1), name=k.name)
    fig.append_trace(trace1, *plot_id)

fig['layout'].update(showlegend=False, title='Plots of different Kernels', height=1000, autosize=True)
py.iplot(fig, filename='kernel_comparison')
Out[4]:
In [5]:
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
x_k = x[:, np.newaxis]
y_k = y[:, np.newaxis]

fig = make_subplots(rows=3, cols=3, subplot_titles=[ k.name for k in kerns ],
                         vertical_spacing=0.05,
                         specs=[[{'is_3d': True}] * 3] * 3)
#, subplot_titles=[ k.name for k in kerns ]
plot_idx = itertools.product([1,2,3], repeat=2)
for k, plot_id in zip(kerns, plot_idx):
    xGrid, yGrid = np.meshgrid(y, x)
    z = k.K(y_k, x_k)

    # adding surfaces to subplots.
    #  
    fig.append_trace(dict(type='surface', x=x, y=y, z=z, showscale=False), *plot_id)

fig['layout'].update(showlegend=False, title='Plots of different Kernels in 3D', height=1000, autosize=True)
py.iplot(fig, filename='kernel_comparison_3D')
Out[5]:

In the sections below, we illustrate using GP for predictions. We plot the Upper and Lower confidence bounds with the means and the points for which we used the oracle to illustrate.

In [6]:
# This is the true unknown function we are trying to approximate
f = lambda x: np.sin(0.9*x).flatten()
#f = lambda x: (0.25*(x**2)).flatten()


# Define the kernel
def kernel(a, b):
    """ GP squared exponential kernel """
    kernelParameter = 0.1
    sqdist = np.sum(a**2,1).reshape(-1,1) + np.sum(b**2,1) - 2*np.dot(a, b.T)
    return np.exp(-.5 * (1/kernelParameter) * sqdist)

N = 10         # number of training points.
n = 1000         # number of test points.
s = 0.00005    # noise variance.

# Sample some input points and noisy versions of the function evaluated at
# these points. 
X = np.random.uniform(-5, 5, size=(N,1))
y = f(X) + s*np.random.randn(N)

K = kernel(X, X)
L = np.linalg.cholesky(K + s*np.eye(N))

# points we're going to make predictions at.
Xtest = np.linspace(-5, 5, n).reshape(-1,1)

# compute the mean at our test points.
Lk = np.linalg.solve(L, kernel(X, Xtest))
mu = np.dot(Lk.T, np.linalg.solve(L, y))

# compute the variance at our test points.
K_ = kernel(Xtest, Xtest)
s2 = np.diag(K_) - np.sum(Lk**2, axis=0)
s = np.sqrt(s2)

fig = go.Figure()
fig.add_trace(go.Scatter(x=Xtest.flat, y=mu+s,
    fill=None,
    mode='lines',
    line_color='grey',
    name='GP Upper Bound'
    ))
fig.add_trace(go.Scatter(
    x=Xtest.flat,
    y=mu-s,
    fill='tonexty',
    mode='lines',
    line_color='grey',
    name='GP Lower Bound'))
fig.add_trace(go.Scatter(x=Xtest.flat, y=mu,
    mode='lines',
    name='GP Mean'
    ))
fig.add_trace(go.Scatter(x=Xtest.flat, y=f(Xtest),
    mode='lines',
    name='Ground Truth'
    ))
fig.add_trace(go.Scatter(
    x=X.reshape(-1),
    y=y,
    mode='markers',
    name='Oracle Evaluations'
))
fig['layout'].update(title='Mean prediction from GP', autosize=True, xaxis_title='x',
                   yaxis_title='y' )
py.iplot(fig, filename='GP_illustration')
Out[6]:
In [7]:
fig = go.Figure()
L = np.linalg.cholesky(K_ + 1e-6*np.eye(n))
f_prior = np.dot(L, np.random.normal(size=(n,10)))
for ea in f_prior.T:
    fig.add_trace(go.Scatter(x=Xtest.flat, y=ea,
        mode='lines', showlegend=False
        ))

fig['layout'].update(title='Functions drawn from GP prior', autosize=True, xaxis_title='x',yaxis_title='y' )
py.iplot(fig, filename='GP_Prior')
Out[7]:
In [8]:
fig = go.Figure()
L = np.linalg.cholesky(K_ + 1e-6*np.eye(n) - np.dot(Lk.T, Lk))
f_post = mu.reshape(-1,1) + np.dot(L, np.random.normal(size=(n,10)))
fig.add_trace(go.Scatter(x=Xtest.flat, y=mu+s,
    fill=None,
    mode='lines',
    line_color='grey',
    name='GP Upper Bound'
    ))
fig.add_trace(go.Scatter(
    x=Xtest.flat,
    y=mu-s,
    fill='tonexty',
    mode='lines',
    line_color='grey',
    name='GP Lower Bound'))
for ea in f_post.T:
    fig.add_trace(go.Scatter(x=Xtest.flat, y=ea,
        mode='lines', showlegend=False
        ))

fig['layout'].update(title='Functions drawn from GP posterior', autosize=True, xaxis_title='x',yaxis_title='y' )
py.iplot(fig, filename='GP_Posterior')
Out[8]: