2D Convolutions Framed as Toeplitz Matrices

by dokuDoku Machine Learning Foundations Convolutions

🔍

## Definition of a 2D Convolution

A 2D Convolution is similar to a 1D Convolution, but with two dimensions instead of one, convolving two matrices. I explain about how 1D convolutions is polynomial multiplication & how to phrase as a Toeplitz multiplication in [1D Convolutions blog post](https://www.noteblogdoku.com/blog/1d-convolutions).

### 1D Discrete Convolution
The **1-dimensional discrete convolution** of two sequences $z[a]$ and $w[a]$, represented by the column vectors $z \in \mathbb{R}^{n \times 1}$ & $w \in \mathbb{R}^{m \times 1}$, is defined as

$$y[x] = (z * w)[x] = \sum_{i=-\infty}^{\infty} z[i]\, w[x-i] $$
We interpret them as zero outside their index ranges
$ z[i] = 0$ when $i \notin [0, n-1]$
$ w[i] = 0$ when $i \notin [0, m-1]$

Then the convolution reduces to a finite sum:
$$ y[x] = (z * w)[x] = \sum_{i=0}^{n-1} z[i]\, w[x-i] $$
where $w[x-i] = 0$ whenever $x-i \notin [0, m-1]$.

The resultant vector has the shape $ y \in \mathbb{R}^{(n+m-1) \times 1} $ with $x \in [0, n+m-2]$.

We can extend this definition to two dimensions with matrices $z,k$ ($k$ denoting kernel, $z$ denoting image input), with $z \in \mathbb{R}^{m \times n}$ and $k \in \mathbb{R}^{r \times s}$. The output matrix $o$ will have shape $o \in \mathbb{R}^{(m + r -1) \times (n + s -1)}$, which is obvious if we consider each dimension of the matrices as its separate 1D convolution.

### 2D Discrete Convolution

We can extend this definition to two dimensions with matrices $z, k$ ($k$ denoting the kernel, $z$ denoting the input image), where $z \in \mathbb{R}^{m \times n}$ and $k \in \mathbb{R}^{r \times s}$.

The **general definition** of the 2D discrete convolution of two sequences $z[i,j]$ and $k[i,j]$ is
$$ o[p,q] = (z * k)[p,q] = \sum_{i=-\infty}^{\infty} \sum_{j=-\infty}^{\infty} z[i,j] \, k[p-i, q-j]. $$

For the **finite dimensional** case (matrices) assume $z$ and $k$ are zero outside their index ranges 
$ z[i,j] = 0$ when $i \notin [0,m-1]$ or $j \notin [0,n-1]$ 
$ k[i,j] = 0$ when $i \notin [0,r-1]$ or $j \notin [0,s-1]$

Then the convolution reduces to the finite double sum

$$ o[p,q] = \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} z[i,j] \, k[p-i, q-j] $$
where $k[p-i, q-j] = 0$ whenever $p-i \notin [0, r-1]$ or $q-j \notin [0,s-1]$

The resulting matrix satisfies $o \in \mathbb{R}^{(m+r-1) \times (n+s-1)}$ with indices $p \in [0, m+r-2]$ and $q \in [0, n+s-2]$, where the output is the **full convolution** with implicit zero padding i.e. *any overlap counts*.

## Convolutions as Toeplitz Matrices

Please read [1D Convolutions blog post](https://www.noteblogdoku.com/blog/1d-convolutions) to see how 1 Dimensional convolutions can be reframed as a Toeplitz matrix multiplication. We are now going to extend this to two dimensions. First consider that we flatten the input image z such that $vec(z) \in \mathbb{R}^{mn \times 1}$ stacking column vectors making $vec(z) = [z_{0,0}, z_{1,0},...z_{1,1},...z_{m-1,n-1}]^T$. Additionally, vectorize the output vector $o$ such that $vec(o) = [o_{0,0},o_{1,0},... o_{0,1},...o_{(m+r-2),(n+s-2)}]^T$.

Given that we want to represent this as a matrix multiplication we will need to have a matrix $f(k)$, as a function of the kernel that is of shape $f(k) \in \mathbb{R}^{(m+r-1)(n+s-1) \times mn}$

$$
\underbrace{\operatorname{vec}(o)}_{\mathbb{R}^{(m+r-1)(n+s-1)\times 1}}
=
\underbrace{
\begin{bmatrix}
\ast &        &        &        \\
     & \ast   &        &        \\
     &        & \ddots &        \\
     &        &        & \ast
\end{bmatrix}
}_{\mathbb{R}^{(m+r-1)(n+s-1)\times mn}}
\underbrace{\operatorname{vec}(z)}_{\mathbb{R}^{mn\times 1}}
$$

I.e. we are finding $f(k)$, such that $vec(o) = f(k) vec(z)$, representing the 2D convolution of kernel $k$ and image $z$. We expect this resultant matrix $f(k)$ to be Toeplitz since 2D convolutions are **shift invariant**.

### Decomposing 2D Convolution as a Series of 1D Convolutions 
To recall, image $z$ has size $z \in \mathbb{R}^{m \times n}$, kernel $k$ has size $k \in \mathbb{R}^{r \times s}$ with output $o \in \mathbb{R}^{(m+r-1)\times (n+s-1)}$.

To warmup let's show an example of a 2D Convolution. In the image below, we show a kernel $k \in \mathbb{R}^{2 \times 2}$ being convolved against image $z \in \mathbb{R}^{3 \times 3}$. We are **only** showing the result of the convolution of the kernel with the upper right corner of z being the upper right corner of the convolution (outlined in red).

![Screenshot 2026 03 04 at 11.19.29 PM](/media/images/20260305_072003_Screenshot_2026-03-04_at_11.19.29_PM.png)
**<center>Figure 1: Example of 2D Convolution of One Overlap Starting at Upper Left Corner Showing Intermediate Step</center>**

Note that the resultant output has that the kernel is **flipped both along the horizontal and vertical axes**. We are using the *mathematical* definition of convolution **NOT** the deep-learning *cross-correlation* definition. The definition is $ o(p,q) = (z * k)[p,q] = \sum_{i=-\infty}^{\infty} \sum_{j=-\infty}^{\infty} z[i,j] \, k[p-i, q-j]$.

We did not show the full output matrix above, it only illustrates how the kernel overlaps with the image. With our mathematical definition, we only require one element of overlap, not complete overlap. Thus the output matrix $o \in \mathbb{R}^{(m+r-1)\times (n+s-1)}$, where the shape is derived from taking 1D convolutions for each dimension of the kernel & image.

In the image below we show an example convolution where we are rephrasing this 2D Convolution as a Matrix multiplication, i.e. 
$$
o[p,q] = (z*k)[p,q] =
\sum_{i=0}^{m-1} \sum_{j=0}^{n-1} z[i,j] \, k[p-i, q-j]
\rightarrow vec(o)=f(k)vec(z)
$$

![Screenshot 2026 03 04 at 11.21.31 PM](/media/images/20260305_072155_Screenshot_2026-03-04_at_11.21.31_PM.png)
**<center>Figure 2: 2D Convolution $\rightarrow$ Matrix Multiplication Example</center>**

### Example Showing 2D Convolution as Sum of 1D Convolutions
We can express a 2D Convolution as a sum of 1D Convolutions. This intuitively makes sense, given the diagrams above, where we sum of the convolution of two sets of two columns (at a specific position) to give us one number.

Explicitly given
$$o[p,q] = \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} z[i,j] \, k[p-i, q-j] $$
We decompose this into the convolutions across column vectors of $k$ & $z$

$$o[p,q] = \sum_{j=0}^{n-1}(\sum_{i=0}^{m-1} z[i,j]k[p-i,q-j]) = \sum_{j=0}^{n-1}(z[:,j]*k[:,q-j])[p]$$

Looking at each term, for example $\sum_{i=0}^{m-1} z[i,0]k[p-i,q]$, this is a **1D Convolution**, stating a 2D Convolution is a sum of 1D Convolutions of column vectors as seen in Figure 1.

Let's find a column vector of $o$ as a sum of 1D Convolutions. Suppose that $o \in \mathbb{R}^{4 \times 4}$, $k \in \mathbb{R}^{2 \times 2}$, and $z \in \mathbb{R}^{3 \times 3}$. Keep in mind  the vectorized version $vec(o) \in \mathbb{R}^{16 \times 1}$, $f(k) \in \mathbb{R}^{16\times 9}$, $vec(z) \in \mathbb{R}^{9 \times 1}$.

For this example let's find $o[:,1]$:

![Screenshot 2026 03 04 at 11.23.53 PM](/media/images/20260305_072816_Screenshot_2026-03-04_at_11.23.53_PM.png)
**<center> Figure 3: Statement of Problem to find $o[:,1]$ given the 2D Convolution of Kernel K with Image Z </center>**

Note since $f(k)$ has a width of 9, this corresponds to the vectorized image vector. The first 3 (or in general $m$ from $z \in \mathbb{R}^{m \times n}$) columns in $f(k)$ correspond to the first vector in $z$, the second 3 to the second vector in $z$, etc. We know from the 2D Convolution equation (illustrated by the example) that

$o[p,q] = \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} z[i,j] \, k[p-i, q-j] = \sum_{j=0}^{n-1} (z[:,j]*k[:,q-j])[p]$, 
in this case

$o[p,1] = \sum_{i=0}^{2} \sum_{j=0}^{2} z[i,j] \, k[p-i, 1-j] $
$o[p,1] = \sum_{j=0}^{2} \sum_{i=0}^{2} z[i,j] \, k[p-i, 1-j] $
$o[p,1] = \sum_{j=0}^{2} (z[:,j] * k[:, 1-j])[p] $
i.e. take the $p^{th}$ element of the sum of 1D Convolutions of $z[:,j]$ & $k[:,1-j]$ to get $o[p,1]$. 
Furthermore, derive the values of $f(k)$ for rows 5 through 8 (inclusive) by finding $o[l,1]$ with $l \in [0,2]$. 
$o[:,1] = \sum_{j=0}^{2} (z[:,j] * k[:, 1-j])$

### Deriving Block Toeplitz Structure 
As a refresher we have matrices $z,k$ ($k$ denoting kernel, $z$ denoting image input), with $z \in \mathbb{R}^{m \times n}$ and $k \in \mathbb{R}^{r \times s}$. The output matrix $o$ will have shape $o \in \mathbb{R}^{(m + r -1) \times (n + s -1)}$. Additionally recall the definition of a 1D Convolution
$$ o[x] = (z * w)[x] = \sum_{i=0}^{n-1} z[i]\, w[x-i] $$

To show the Block Toeplitz structure consider building up these rows with respect to each column of $z$ and $k$. To do this we are going to apply the 1D Convolution definition across the rows and hold the columns constant:

$$o[p,q] = \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} z[i,j] \, k[p-i, q-j] $$ 
$$o[p,q] = \sum_{j=0}^{n-1} (\sum_{i=0}^{m-1} z[i,j] \, k[p-i, q-j]) $$
$$o[p,q] = \sum_{j=0}^{n-1} (z[:,j]*k[:,q-j])[p] $$

and define $z_j$ and $k_{q-j}$ as column vectors of $z$ & $k$ respectively, i.e. $z_j = z[:,j] \in \mathbb{R}^m$ & $k_{q-j} = k[:,q-j] \in \mathbb{R}^r$. Let $T(k_{q-j})$ be the Toeplitz matrix generated by kernel column $k_{q-j}$ with $T(k_{q-j}) \in \mathbb{R}^{(m+r-1)\times m}$. 
Then we have, from [1D Convolutions blog post](https://www.noteblogdoku.com/blog/1d-convolutions) 
$$ (z[:,j]*k[:,q-j]) = T(k_{q-j})z_j$$
Substituting into our equation 
$o[:,q] = \sum_{j=0}^{n-1} (z[:,j]*k[:,q-j])$
this becomes 
$$o[:,q] = \sum_{j=0}^{n-1} T(k_{q-j})z_j$$
with each 
$$T(k_i) \in \mathbb{R}^{(m+r-1) \times m}$$ 
since $z_j = z[:,j] \in \mathbb{R}^m$ and $k_{q-j} = k[:,q-j] \in \mathbb{R}^r$ making the length of their convolution $o[:,q] \in \mathbb{R}^{m+r-1}$ as we can see if we remember convolution is polynomial multiplication (0 indexed degree $m$ & $r$ polynomials multiplied give degree $m+r-1$ polynomial).

Recall that in the formulation $vec(o) = f(k) vec(z)$ (and $o \in \mathbb{R}^{(m+r-1) \times (n+s-1)}$) that each vector $z_j \in \mathbb{R}^{m \times 1}$ and there's $n$ such vectors in $z$. Thus as we can see, each set of $m$ columns in $f(k)$ corresponds to one vector in $z$. Thus we can write
$$ o[:,q] = [T(k_q), T(k_{q-1}), ..., T(k_{q-(n-1)})]  
\begin{bmatrix}
  z_{0} \\
  z_{1} \\
  \vdots \\
  z_{n-1} \\
  \end{bmatrix} $$

Each output column $o[:,q]$ is a linear combination of Toeplitz matrices generated by shifted kernel columns with out of range kernel columns equaling 0 s.t. $k_t = 0$ when $t \notin [0,s-1]$.

Stack all output columns to produce: 
$$
vec(o) = 
  \begin{bmatrix}
  T(k_0) & T(k_{0-1}) & \cdots & T(k_{0-(n-1)})\\
  T(k_1) & T(k_{0}) & \cdots & T(k_{1-(n-1)}) \\
  \vdots & \vdots & \ddots & \vdots \\
  T(k_{(n+s-1)} & T(k_{(n+s-1)-1}) & \cdots & T(k_{(n+s-1)-(n-1)}) \\
  \end{bmatrix}
vec(z)
$$

---
### Toeplitz Block Size Relating to Rows & Columns of Toeplitz Blocks in $f(k)$
Since $o \in \mathbb{R}^{(m+r-1)\times (n+s-1)}$ and $T(k_i) \in \mathbb{R}^{(m+r-1) \times m}$, this means that vertically there are $n+s-1$ Toeplitz block entries in $f(k)$ since $vec(o) \in \mathbb{R}^{(m+r-1)(n+s-1) \times 1}$ and $\frac{(m+r-1)(n+s-1)}{m+r-1} = n+s-1$ giving us the number of rows of Toeplitz blocks in our $f(k)$ construction.

Additionally, since $vec(z) \in \mathbb{R}^{mn \times 1}$ and the width of a Toeplitz matrix is $m$ from $T(k_i) \in \mathbb{R}^{(m+r-1) \times m}$, we have that $n$ Toeplitz blocks fit horizontally into $f(k)$ since $\frac{mn}{m} = n$.

Analyzing the width and height of $f(k)$, since there are $n$ Toeplitz block columns and $n+s-1$ Toeplitz block rows this denotes $f(k) \in \mathbb{R}^{(n+s-1)(m+r-1) \times nm}$ corresponding to the sizes of output $vec(o)$ and input $vec(z)$.

---

The final structure of our matrix $f(k)$ is
$$
f(k)
=
\begin{bmatrix}
T(k_0)      & 0           & 0           & \cdots & 0 \\
T(k_1)      & T(k_0)      & 0           & \cdots & 0 \\
T(k_2)      & T(k_1)      & T(k_0)      & \cdots & 0 \\
\vdots      & \vdots      & \vdots      & \ddots & \vdots \\
T(k_{s-1})  & T(k_{s-2})  & \cdots      & T(k_0) & 0 \\
0           & T(k_{s-1})  & \cdots      & T(k_1) & T(k_0) \\
\vdots      & \vdots      &             & \vdots & \vdots \\
0           & 0           & \cdots      & T(k_{s-1}) & T(k_{s-2})
\end{bmatrix}
$$

### Example 
In the following image a concrete example is shown for how the Toeplitz blocks for an output vector are combined together to give us the final result. Note that this is for one row only, but can be easily extended to other rows and the main point is to show how the sum of 1D Convolutions give us a 2D Convolution and this is core to the Toeplitz Block structure for 2D Convolutions

![Screenshot 2026 03 04 at 11.26.03 PM](/media/images/20260305_072627_Screenshot_2026-03-04_at_11.26.03_PM.png)
**<center> Figure 4: Explanatory Example showing Toeplitz Blocks for 1D Convolutions Building Output Vector. 
Note $k[1,-1]$, $k[2,-1]$ not in $k$, hence why said block is all 0's </center>**