# Shae's Ramblings

Stuff I find slightly meaningful and close to achievable

## Plotting Radial functions in python

This post is about examples of radial functions in the plane: the goal is to plot them using matplotlib and numpy or equivalently jax.numpy.

## Definition

For any point $p = (x,y)$ in the plane, a radial function $\varphi$ is a function which only depends on $p$'s distance to the origin: $$\varphi(p) = \varphi(x,y) = \varphi(\lVert p \rVert) = \varphi\left(\sqrt{x^2 + y^2}\right) = \varphi ( r )$$
where $r = \lVert p \rVert$ usually denotes the distance to the origin.

## Plotting contours in python

Plotting contours can be done in several ways, but here we'll use a bilinear interpolation for the function values on a square grid using plt.imshow. First, we construct the grid:

num_points = 1000
xgrid = ygrid = jnp.linspace(-5,5, num_points)
meshx, meshy = jnp.meshgrid(xgrid, ygrid)


Then, to check we can easily plot a given function $\varphi$, we first start with the simplest possible radial function, the squared norm $\varphi ( r ) = r^2$:

We're using the colormap called “viridis”: low values are in blue and darker, and high values of $\varphi$ have a lighter color, going from light blue to blue-green to green to yellow.

The code snippet used to generate the contour plot is quite short:

def contour_levels_plot(zgrid, num_levels=30, ax=None):
if not ax:
ax = plt
vmin = jnp.min(zgrid)
vmax = jnp.max(zgrid)
levels = jnp.linspace(vmin, vmax, num_levels)
ax.imshow(zgrid, interpolation='bilinear', origin='lower',
vmin=vmin, vmax=vmax, cmap=cm.viridis, aspect='auto')
ax.contour(zgrid, origin='lower', linewidths=1.0,
colors="#C8A1A1", levels=levels)

ax.axis('off')


Our goal is to approximately know the behavior of $\varphi$ in the plane, so not having $z$-values is not a problem. At the end, we'll give another method for people who need to know precisely $x,y,z$ values so that $z=\varphi(x,y)$.

## Cauchy-like function

Let's first plot the function $\varphi ( r ) = \dfrac{1}{1 + r^2}$.

This function corresponds to the Cauchy distribution or to the derivative of the arc-tangent. We expect higher radii $r \gg 1$ to be squashed close to $0$, while increasingly small radii $r \to 0$ to yield values close to $\sup_{r \geq 0} (\varphi) = 1$.
The plot confirms our intuitive understanding:

The RQ kernel is fairly popular in applied sciences, statistics and machine learning. Its expression is:

$$\varphi ( r ) = \sigma \left(1 + \dfrac{r^2}{2\gamma \sigma^2}\right)^{-\gamma}$$

It's not hard to recognize a composition of simple functions: a rescaling operation $r\mapsto r/\sigma^2$, our cauchy-like function $r\mapsto (1 + r^2)^{-1} = y$, and an exponentiation $y \mapsto y^{\gamma}$.

We'll visualize the RQ kernel for $\gamma = 0.2$, $\sigma=2.4$:

We can see the mass is more spread instead of being concentrated around $0$. When lowering $\sigma$ or increasing $\gamma$ one creates increasingly more concentration around $0$.

## Gaussian kernel

A famous bivariate function in statistics and machine learning is the gaussian kernel $\varphi ( r ) = \exp(-r^2)$. It is the limit of the RQ kernel as $\gamma \to \infty$. Let's plot the gaussian kernel for $\sigma = 2.4$ below:

## Periodic kernels

Any function of $r = \lVert p \rVert$ can be used. To model a periodic function, one can use $\varphi ( r ) = \sin ( r )$. The resulting plot is below:

## Other classical kernels

Many different combinations can be constructed using elementary functions of $r = \lVert p \rVert$. For example, the logarithmic $r\mapsto \log ( 1 + r^2)$ or the polynomial $r\mapsto (1 + r^2)^d$. One can add more complexity by adding kernels $\alpha_1\varphi_1 + \alpha_2\varphi_2$ or multiplying $\varphi_1 \varphi_2$.

## Spline kernels

Suppose now $\varphi$ is a cubic spline, which we implement using Scipy's univariate spline, a method of the scipy.interpolate submodule.

Fitting $\varphi$ to $15$ random $y$-knots gives us the following radial function:

A fairly common use of radial functions is to first map the input $p=(x,y)$ into a new domain using mappings $\psi = (\psi_1, \psi_2)$.
The radial function $\varphi$ acts on the transformed domain: $z = \varphi( \lVert \psi ( p ) \rVert )$ and produces a highly non-radial function from a radial function. An example can be seen below, using both $\varphi, \psi_i$ as cubic splines:

An application of such transforms is the learning of kernel functions: a usually radial kernel
$$k(x,y) = \varphi ( \lVert x – y \rVert )$$
measures the similarity between two points $x,y$. In order to learn data-driven similarities, the kernel $k(\cdot, \cdot)$ is first composed with a non-linear map $\psi$, giving the new similarity $\tilde{k}(x,y) = k(\psi(x), \psi(y) )$. An exemple of data-driven kernel can be found in the 2017 NeurIPS MMD-GAN paper. In this paper, the mapping $\psi$ is a neural network.