1D Convolutions

## What is a 1-Dimensional Convolution

Formally a *continuous* 1-Dimensional Convolution is defined as the function $y(x)$ such that
$$ y(x) = \int_{a=-\infty}^{\infty} z(a) w(x-a)da \rightarrow y(x) = (z * w)(x)$$
where $z(x)$ is the *input*, $w(x)$ is the *kernel*, and $y(x)$ output is commonly referred to as the *feature map*.

We are interested in *discrete convolutions* since this will be a building block towards *Convolutional Neural Network Layers*. A Discrete 1-Dimensional Convolution is defined as 
$$ y(x) = (z * w)(x) = \sum_{a=-\infty}^{\infty} z(a)w(x-a) $$
which follows directly from the continuous definition.

## Polynomial Multiplication
To prove the properties of Discrete Convolutions, we will show that discrete convolutions are equivalent to **polynomial multiplication** and leverage those properties. 
Given $f(x) = \alpha_0 + \alpha_1 x + ... \alpha_nx^n$ and $g(x) = \beta_0 + \beta_1 x + ... \beta_nx^n$ we can say
$$ h(x) = f(x) g(x) = (\alpha_0 \beta_0)x^0 + (\alpha_0 \beta_1 + \alpha_1 \beta_0) x^1 + ... [\sum_{i+j=n}^n \alpha_i \beta_j]x^n$$
$$ h(x) = \sum_{i=0}^n (\sum_{j+k=i}^i \alpha_j \beta_k)x^i = \sum_{i=0}^n (\sum_{j=0}^i \alpha_j \beta_{i-j})x^i$$

We will show that through the definition of discrete convolution $ y(x) = (z * w)(x) = \sum_{a=-\infty}^{\infty} z(a)w(x-a) $ for $a \in [0,n]$ that this is equivalent to calculating the coefficients of polynomial multiplication.

Suppose that $z(x)$ & $w(x)$ are defined by the following sequences (respectively corresponding to the coefficients of $f(x)$, $g(x)$, assuming both have same degree n). Note that this proof holds when the degrees of $f(x)$ and $g(x)$ do not match and that $z(i)$,$w(j)$ (where $i$,$j$ arbitrary indices) corresponds to the $i$th,$j$th coefficents of polynomials $f(x)$ and $g(x)$ respectively. 
$z(x) = \alpha_0, \alpha_1, ... \alpha_n$
$w(x) = \beta_0, \beta_1, ... \beta_n$
With $z(x),w(x) = 0 | x \notin [0,n]$. Note that since $f(x),g(x)$ are both of degree $n$ that $z(x),w(x)$ can only be non-zero when $x \in [0,n]$, which is why in our definition above we only included indices from 0 to n. However these are infinite sequences which is why we stated $z(x),w(x) = 0 | x \notin [0,n]$.

Then $(z*w)(x) = \sum_{a=0}^n z(a)w(x-a)$ gives the following for $x \in [0,n]$:
$(z*w)(0) = \alpha_0 \beta_0$
$(z*w)(1) = \alpha_0 \beta_1 + \alpha_1 \beta_0$
$(z*w)(2) = \alpha_0 \beta_2 + \alpha_1 \beta_1 + \alpha_2 \beta_0$
...
$(z*w)(n) = \sum_{a=0}^n z(a)w(n-a)$

We can observe the following pattern:
$(z*w)(x) = \sum_{a=0}^n z(a)w(x-a) = \sum_{a=0}^n \alpha_a \beta_{n-a}$

We observe the convolution formula $\sum_{j=0}^i \alpha_j \beta_{i-j}$ is exactly the formula for the ith coefficient of the product polynomial $h(x)=f(x)g(x)$ as previously stated. Since these two definitions are precisely the same, polynomial multiplication is equivalent to the convolution of coefficient sequences. I.e. *convolving the coefficients of two polynomials produces exactly the coefficients of their product*, proving that 1D discrete convolution is equivalent to polynomial multiplication.

We get the following properties from polynomial multiplication that map to convolutions:
- Associativity 
- Identity 
- Distributivity  
- Commutativity

The last one is the most important since it states that $(z*w)(x) = (w*z)(x)$ which makes sense because multiplying $f(x)g(x) = g(x)f(x)$.

## Convolutions as Toeplitz Matrices

We are interested in how to easily represent the discrete convolution for machine learning. Similar to how linear layers are of the form $y=Mx+b$ (see [Fitting Simple Linear Model blog post](https://www.noteblogdoku.com/blog/fitting-a-simple-linear-model-using-gradient-descent)), perhaps there is a way we can represent convolutions as matrix multiplications. We will derive how the convolution $y(x) = (w*z)(x)$ can be represented by $y=T(w)z$ where $T(w)$ is the Toepliz matrix of vector $w$.

To start let's write out the system of equations the convolution gives us. Note that since we know convolutions are polynomial multiplication, that for $w,z \in \mathbb{R}^{n+1}$ that this corresponds to the $x^n$ factors being multiplied (coefficient count starts at 0 i.e. $x^0$). Thus for this polynomial multiplication the highest term is: $w_n z_n x^{2n}$, thus giving us that there are **$2n+1$** total cofactors, meaning that $y \in \mathbb{R}^{2n+1}$. 
$y(0) = w(0)z(0)$
$y(1) = w(0)z(1) + w(1)z(0)$
$y(2) = w(0)z(2) + w(1)z(1) + w(2)z(0)$
...
$y(n) = w(0)z(n) + w(1)z(n-1) + ... + w(n)z(0)$
$y(n+1) = w(1)z(n) + w(2)z(n-1) + ... + w(n)z(1)$
...
$y(2n) = w(n)z(n)$

This is a linear series of equations, which we can represent as a matrix.
$$
  \begin{bmatrix}
  y_{0} \\
  y_{1} \\
  \vdots \\
  y_n \\
  \vdots \\
  y_{2n}\\
  \end{bmatrix}
  =
  \begin{bmatrix}
  w_0 & 0 & \cdots & 0 \\
  w_1 & w_0 & \cdots & 0 \\
  \vdots & \vdots & \ddots & \vdots \\
  w_n & w_{n-1} & \cdots & w_0 \\
  \vdots & \vdots & \ddots & \vdots \\
  0 & 0 & \cdots & w_n
  \end{bmatrix}
  \times 
  \begin{bmatrix}
  z_{0} \\
  z_{1} \\
  \vdots \\
  z_{n}\\
  \end{bmatrix}
  $$

As we can see this is a matrix such that along each diagonal there is the same value, which is by definition a **Toeplitz matrix**. The Toeplitz matrix $T(w) \in \mathbb{R}^{(2n+1) \times (n+1)}$ since there are $2n+1$ equations with $n+1$ inputs. **Note** that since this is a *shift invariant system*, that we expected this to be representable by a Toeplitz matrix since the kernel does not change with respect to the input vector.

### What about $y(x) = (w*z)(x)$ with $w \in \mathbb{R}^m$ & $z \in \mathbb{R}^n$?
**Note** that the shape of $T(w)$ has $n$ columns from $z \in \mathbb{R}^{n}$ and $m+n-1$ rows since if $w \in \mathbb{R}^{m}$ and $z \in \mathbb{R}^{n}$, then the highest order term (in polynomial multiplication) is $m+n-2$ since $w$ represents a polynomial of degree m-1 (coefficents from $\alpha_0$ to $\alpha_{m-1}$) and $z$ represents a polynomial of degree n-1 (coefficents from $\alpha_0$ to $\alpha_{n-1}$), giving highest term $x^{m+n-2}$ but with the 0th term gives us a total of $m+n-1$ terms. Thus $T(w) \in \mathbb{R}^{(m+n-1) \times n}$.

*NOTE* what this implies is that we should convolve the *larger vector* (take the Toeplitz of the larger vector) to make the Toepliz matrix smaller (skinnier). 
This is because if $n << m \rightarrow T(w) \in \mathbb{R}^{(m+n-1) \times n} < T(z) \in \mathbb{R}^{(m+n-1) \times m}$.

## Example: 1D Discrete Convolution

Let the input signal be

$$
z = [1,\,2,\,3,\,4]
$$

and the kernel be

$$
w = [2,\,1]
$$

We use the discrete convolution definition

$$
y(x) = (z * w)(x) = \sum_{a=-\infty}^{\infty} z(a)\,w(x-a).
$$

Assume $z(a)=0$ and $w(a)=0$ outside their index ranges.

Since the input has length 4 and the kernel has length 2, the output length is $4 + 2 - 1 = 5$

Therefore $x = 0,1,2,3,4$.

### Compute Each Output Value

$x = 0$
$y(0) = z(0)w(0) = 1 \cdot 2 = 2$

$x = 1$
$
\begin{aligned}
y(1) &= z(0)w(1) + z(1)w(0) \\
     &= 1 \cdot 1 + 2 \cdot 2 \\
     &= 1 + 4 \\
     &= 5
\end{aligned}
$

$x = 2$
$
\begin{aligned}
y(2) &= z(1)w(1) + z(2)w(0) \\
     &= 2 \cdot 1 + 3 \cdot 2 \\
     &= 2 + 6 \\
     &= 8
\end{aligned}
$

$x = 3$
$
\begin{aligned}
y(3) &= z(2)w(1) + z(3)w(0) \\
     &= 3 \cdot 1 + 4 \cdot 2 \\
     &= 3 + 8 \\
     &= 11
\end{aligned}
$

$x = 4$
$
y(4) = z(3)w(1) = 4 \cdot 1 = 4
$

Note that this is equivalent to the Toeplitz Matrix multiplication 
$$ y = T(w)z $$ 
such that 
$$ y = 
\begin{bmatrix}
w_0 & 0 & 0 & 0 \\
w_1 & w_0 & 0 & 0 \\
0 & w_1 & w_0 & 0 \\
0 & 0 & w_1 & w_0 \\
0 & 0 & 0 & w_1 \\
\end{bmatrix}
\begin{bmatrix}
z_0 \\
z_1 \\
z_2 \\
z_3 \\
\end{bmatrix}
$$

The final result is
$$ z * w = [2,\,5,\,8,\,11,\,4] $$

Comments (0)

✉️ Join Mailing List