1D Convolutions by dokuDoku Machine Learning Foundations Convolutions 🔍 ## What is a 1-Dimensional Convolution Formally a *continuous* 1-Dimensional Convolution is defined as the function $y(x)$ such that $$ y(x) = \int_{a=-\infty}^{\infty} z(a) w(x-a)da \rightarrow y(x) = (z * w)(x)$$ where $z(x)$ is the *input*, $w(x)$ is the *kernel*, and $y(x)$ output is commonly referred to as the *feature map*. We are interested in *discrete convolutions* since this will be a building block towards *Convolutional Neural Network Layers*. A Discrete 1-Dimensional Convolution is defined as $$ y(x) = (z * w)(x) = \sum_{a=-\infty}^{\infty} z(a)w(x-a) $$ which follows directly from the continuous definition. ## Polynomial Multiplication To prove the properties of Discrete Convolutions, we will show that discrete convolutions are equivalent to **polynomial multiplication** and leverage those properties. Given $f(x) = \alpha_0 + \alpha_1 x + ... \alpha_nx^n$ and $g(x) = \beta_0 + \beta_1 x + ... \beta_nx^n$ we can say $$ h(x) = f(x) g(x) = (\alpha_0 \beta_0)x^0 + (\alpha_0 \beta_1 + \alpha_1 \beta_0) x^1 + ... [\sum_{i+j=n}^n \alpha_i \beta_j]x^n$$ $$ h(x) = \sum_{i=0}^n (\sum_{j+k=i}^i \alpha_j \beta_k)x^i = \sum_{i=0}^n (\sum_{j=0}^i \alpha_j \beta_{i-j})x^i$$ We will show that through the definition of discrete convolution $ y(x) = (z * w)(x) = \sum_{a=-\infty}^{\infty} z(a)w(x-a) $ for $a \in [0,n]$ that this is equivalent to calculating the coefficients of polynomial multiplication. Suppose that $z(x)$ & $w(x)$ are defined by the following sequences (respectively corresponding to the coefficients of $f(x)$, $g(x)$, assuming both have same degree n). Note that this proof holds when the degrees of $f(x)$ and $g(x)$ do not match and that $z(i)$,$w(j)$ (where $i$,$j$ arbitrary indices) corresponds to the $i$th,$j$th coefficents of polynomials $f(x)$ and $g(x)$ respectively. $z(x) = \alpha_0, \alpha_1, ... \alpha_n$ $w(x) = \beta_0, \beta_1, ... \beta_n$ With $z(x),w(x) = 0 | x \notin [0,n]$. Note that since $f(x),g(x)$ are both of degree $n$ that $z(x),w(x)$ can only be non-zero when $x \in [0,n]$, which is why in our definition above we only included indices from 0 to n. However these are infinite sequences which is why we stated $z(x),w(x) = 0 | x \notin [0,n]$. Then $(z*w)(x) = \sum_{a=0}^n z(a)w(x-a)$ gives the following for $x \in [0,n]$: $(z*w)(0) = \alpha_0 \beta_0$ $(z*w)(1) = \alpha_0 \beta_1 + \alpha_1 \beta_0$ $(z*w)(2) = \alpha_0 \beta_2 + \alpha_1 \beta_1 + \alpha_2 \beta_0$ ... $(z*w)(n) = \sum_{a=0}^n z(a)w(n-a)$ We can observe the following pattern: $(z*w)(x) = \sum_{a=0}^n z(a)w(x-a) = \sum_{a=0}^n \alpha_a \beta_{n-a}$ We observe the convolution formula $\sum_{j=0}^i \alpha_j \beta_{i-j}$ is exactly the formula for the ith coefficient of the product polynomial $h(x)=f(x)g(x)$ as previously stated. Since these two definitions are precisely the same, polynomial multiplication is equivalent to the convolution of coefficient sequences. I.e. *convolving the coefficients of two polynomials produces exactly the coefficients of their product*, proving that 1D discrete convolution is equivalent to polynomial multiplication. We get the following properties from polynomial multiplication that map to convolutions: - Associativity - Identity - Distributivity - Commutativity The last one is the most important since it states that $(z*w)(x) = (w*z)(x)$ which makes sense because multiplying $f(x)g(x) = g(x)f(x)$. ## Convolutions as Toeplitz Matrices We are interested in how to easily represent the discrete convolution for machine learning. Similar to how linear layers are of the form $y=Mx+b$ (see [Fitting Simple Linear Model blog post](https://www.noteblogdoku.com/blog/fitting-a-simple-linear-model-using-gradient-descent)), perhaps there is a way we can represent convolutions as matrix multiplications. We will derive how the convolution $y(x) = (w*z)(x)$ can be represented by $y=T(w)z$ where $T(w)$ is the Toepliz matrix of vector $w$. To start let's write out the system of equations the convolution gives us. Note that since we know convolutions are polynomial multiplication, that for $w,z \in \mathbb{R}^{n+1}$ that this corresponds to the $x^n$ factors being multiplied (coefficient count starts at 0 i.e. $x^0$). Thus for this polynomial multiplication the highest term is: $w_n z_n x^{2n}$, thus giving us that there are **$2n+1$** total cofactors, meaning that $y \in \mathbb{R}^{2n+1}$. $y(0) = w(0)z(0)$ $y(1) = w(0)z(1) + w(1)z(0)$ $y(2) = w(0)z(2) + w(1)z(1) + w(2)z(0)$ ... $y(n) = w(0)z(n) + w(1)z(n-1) + ... + w(n)z(0)$ $y(n+1) = w(1)z(n) + w(2)z(n-1) + ... + w(n)z(1)$ ... $y(2n) = w(n)z(n)$ This is a linear series of equations, which we can represent as a matrix. $$ \begin{bmatrix} y_{0} \\ y_{1} \\ \vdots \\ y_n \\ \vdots \\ y_{2n}\\ \end{bmatrix} = \begin{bmatrix} w_0 & 0 & \cdots & 0 \\ w_1 & w_0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ w_n & w_{n-1} & \cdots & w_0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & w_n \end{bmatrix} \times \begin{bmatrix} z_{0} \\ z_{1} \\ \vdots \\ z_{n}\\ \end{bmatrix} $$ As we can see this is a matrix such that along each diagonal there is the same value, which is by definition a **Toeplitz matrix**. The Toeplitz matrix $T(w) \in \mathbb{R}^{(2n+1) \times (n+1)}$ since there are $2n+1$ equations with $n+1$ inputs. **Note** that since this is a *shift invariant system*, that we expected this to be representable by a Toeplitz matrix since the kernel does not change with respect to the input vector. ### What about $y(x) = (w*z)(x)$ with $w \in \mathbb{R}^m$ & $z \in \mathbb{R}^n$? **Note** that the shape of $T(w)$ has $n$ columns from $z \in \mathbb{R}^{n}$ and $m+n-1$ rows since if $w \in \mathbb{R}^{m}$ and $z \in \mathbb{R}^{n}$, then the highest order term (in polynomial multiplication) is $m+n-2$ since $w$ represents a polynomial of degree m-1 (coefficents from $\alpha_0$ to $\alpha_{m-1}$) and $z$ represents a polynomial of degree n-1 (coefficents from $\alpha_0$ to $\alpha_{n-1}$), giving highest term $x^{m+n-2}$ but with the 0th term gives us a total of $m+n-1$ terms. Thus $T(w) \in \mathbb{R}^{(m+n-1) \times n}$. *NOTE* what this implies is that we should convolve the *larger vector* (take the Toeplitz of the larger vector) to make the Toepliz matrix smaller (skinnier). This is because if $n << m \rightarrow T(w) \in \mathbb{R}^{(m+n-1) \times n} < T(z) \in \mathbb{R}^{(m+n-1) \times m}$. ## Example: 1D Discrete Convolution Let the input signal be $$ z = [1,\,2,\,3,\,4] $$ and the kernel be $$ w = [2,\,1] $$ We use the discrete convolution definition $$ y(x) = (z * w)(x) = \sum_{a=-\infty}^{\infty} z(a)\,w(x-a). $$ Assume $z(a)=0$ and $w(a)=0$ outside their index ranges. Since the input has length 4 and the kernel has length 2, the output length is $4 + 2 - 1 = 5$ Therefore $x = 0,1,2,3,4$. ### Compute Each Output Value $x = 0$ $y(0) = z(0)w(0) = 1 \cdot 2 = 2$ $x = 1$ $ \begin{aligned} y(1) &= z(0)w(1) + z(1)w(0) \\ &= 1 \cdot 1 + 2 \cdot 2 \\ &= 1 + 4 \\ &= 5 \end{aligned} $ $x = 2$ $ \begin{aligned} y(2) &= z(1)w(1) + z(2)w(0) \\ &= 2 \cdot 1 + 3 \cdot 2 \\ &= 2 + 6 \\ &= 8 \end{aligned} $ $x = 3$ $ \begin{aligned} y(3) &= z(2)w(1) + z(3)w(0) \\ &= 3 \cdot 1 + 4 \cdot 2 \\ &= 3 + 8 \\ &= 11 \end{aligned} $ $x = 4$ $ y(4) = z(3)w(1) = 4 \cdot 1 = 4 $ Note that this is equivalent to the Toeplitz Matrix multiplication $$ y = T(w)z $$ such that $$ y = \begin{bmatrix} w_0 & 0 & 0 & 0 \\ w_1 & w_0 & 0 & 0 \\ 0 & w_1 & w_0 & 0 \\ 0 & 0 & w_1 & w_0 \\ 0 & 0 & 0 & w_1 \\ \end{bmatrix} \begin{bmatrix} z_0 \\ z_1 \\ z_2 \\ z_3 \\ \end{bmatrix} $$ The final result is $$ z * w = [2,\,5,\,8,\,11,\,4] $$ ## What is a 1-Dimensional Convolution Formally a *continuous* 1-Dimensional Convolution is defined as the function $y(x)$ such that $$ y(x) = \int_{a=-\infty}^{\infty} z(a) w(x-a)da \rightarrow y(x) = (z * w)(x)$$ where $z(x)$ is the *input*, $w(x)$ is the *kernel*, and $y(x)$ output is commonly referred to as the *feature map*. We are interested in *discrete convolutions* since this will be a building block towards *Convolutional Neural Network Layers*. A Discrete 1-Dimensional Convolution is defined as $$ y(x) = (z * w)(x) = \sum_{a=-\infty}^{\infty} z(a)w(x-a) $$ which follows directly from the continuous definition. ## Polynomial Multiplication To prove the properties of Discrete Convolutions, we will show that discrete convolutions are equivalent to **polynomial multiplication** and leverage those properties. Given $f(x) = \alpha_0 + \alpha_1 x + ... \alpha_nx^n$ and $g(x) = \beta_0 + \beta_1 x + ... \beta_nx^n$ we can say $$ h(x) = f(x) g(x) = (\alpha_0 \beta_0)x^0 + (\alpha_0 \beta_1 + \alpha_1 \beta_0) x^1 + ... [\sum_{i+j=n}^n \alpha_i \beta_j]x^n$$ $$ h(x) = \sum_{i=0}^n (\sum_{j+k=i}^i \alpha_j \beta_k)x^i = \sum_{i=0}^n (\sum_{j=0}^i \alpha_j \beta_{i-j})x^i$$ We will show that through the definition of discrete convolution $ y(x) = (z * w)(x) = \sum_{a=-\infty}^{\infty} z(a)w(x-a) $ for $a \in [0,n]$ that this is equivalent to calculating the coefficients of polynomial multiplication. Suppose that $z(x)$ & $w(x)$ are defined by the following sequences (respectively corresponding to the coefficients of $f(x)$, $g(x)$, assuming both have same degree n). Note that this proof holds when the degrees of $f(x)$ and $g(x)$ do not match and that $z(i)$,$w(j)$ (where $i$,$j$ arbitrary indices) corresponds to the $i$th,$j$th coefficents of polynomials $f(x)$ and $g(x)$ respectively. $z(x) = \alpha_0, \alpha_1, ... \alpha_n$ $w(x) = \beta_0, \beta_1, ... \beta_n$ With $z(x),w(x) = 0 | x \notin [0,n]$. Note that since $f(x),g(x)$ are both of degree $n$ that $z(x),w(x)$ can only be non-zero when $x \in [0,n]$, which is why in our definition above we only included indices from 0 to n. However these are infinite sequences which is why we stated $z(x),w(x) = 0 | x \notin [0,n]$. Then $(z*w)(x) = \sum_{a=0}^n z(a)w(x-a)$ gives the following for $x \in [0,n]$: $(z*w)(0) = \alpha_0 \beta_0$ $(z*w)(1) = \alpha_0 \beta_1 + \alpha_1 \beta_0$ $(z*w)(2) = \alpha_0 \beta_2 + \alpha_1 \beta_1 + \alpha_2 \beta_0$ ... $(z*w)(n) = \sum_{a=0}^n z(a)w(n-a)$ We can observe the following pattern: $(z*w)(x) = \sum_{a=0}^n z(a)w(x-a) = \sum_{a=0}^n \alpha_a \beta_{n-a}$ We observe the convolution formula $\sum_{j=0}^i \alpha_j \beta_{i-j}$ is exactly the formula for the ith coefficient of the product polynomial $h(x)=f(x)g(x)$ as previously stated. Since these two definitions are precisely the same, polynomial multiplication is equivalent to the convolution of coefficient sequences. I.e. *convolving the coefficients of two polynomials produces exactly the coefficients of their product*, proving that 1D discrete convolution is equivalent to polynomial multiplication. We get the following properties from polynomial multiplication that map to convolutions: - Associativity - Identity - Distributivity - Commutativity The last one is the most important since it states that $(z*w)(x) = (w*z)(x)$ which makes sense because multiplying $f(x)g(x) = g(x)f(x)$. ## Convolutions as Toeplitz Matrices We are interested in how to easily represent the discrete convolution for machine learning. Similar to how linear layers are of the form $y=Mx+b$ (see [Fitting Simple Linear Model blog post](https://www.noteblogdoku.com/blog/fitting-a-simple-linear-model-using-gradient-descent)), perhaps there is a way we can represent convolutions as matrix multiplications. We will derive how the convolution $y(x) = (w*z)(x)$ can be represented by $y=T(w)z$ where $T(w)$ is the Toepliz matrix of vector $w$. To start let's write out the system of equations the convolution gives us. Note that since we know convolutions are polynomial multiplication, that for $w,z \in \mathbb{R}^{n+1}$ that this corresponds to the $x^n$ factors being multiplied (coefficient count starts at 0 i.e. $x^0$). Thus for this polynomial multiplication the highest term is: $w_n z_n x^{2n}$, thus giving us that there are **$2n+1$** total cofactors, meaning that $y \in \mathbb{R}^{2n+1}$. $y(0) = w(0)z(0)$ $y(1) = w(0)z(1) + w(1)z(0)$ $y(2) = w(0)z(2) + w(1)z(1) + w(2)z(0)$ ... $y(n) = w(0)z(n) + w(1)z(n-1) + ... + w(n)z(0)$ $y(n+1) = w(1)z(n) + w(2)z(n-1) + ... + w(n)z(1)$ ... $y(2n) = w(n)z(n)$ This is a linear series of equations, which we can represent as a matrix. $$ \begin{bmatrix} y_{0} \\ y_{1} \\ \vdots \\ y_n \\ \vdots \\ y_{2n}\\ \end{bmatrix} = \begin{bmatrix} w_0 & 0 & \cdots & 0 \\ w_1 & w_0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ w_n & w_{n-1} & \cdots & w_0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & w_n \end{bmatrix} \times \begin{bmatrix} z_{0} \\ z_{1} \\ \vdots \\ z_{n}\\ \end{bmatrix} $$ As we can see this is a matrix such that along each diagonal there is the same value, which is by definition a **Toeplitz matrix**. The Toeplitz matrix $T(w) \in \mathbb{R}^{(2n+1) \times (n+1)}$ since there are $2n+1$ equations with $n+1$ inputs. **Note** that since this is a *shift invariant system*, that we expected this to be representable by a Toeplitz matrix since the kernel does not change with respect to the input vector. ### What about $y(x) = (w*z)(x)$ with $w \in \mathbb{R}^m$ & $z \in \mathbb{R}^n$? **Note** that the shape of $T(w)$ has $n$ columns from $z \in \mathbb{R}^{n}$ and $m+n-1$ rows since if $w \in \mathbb{R}^{m}$ and $z \in \mathbb{R}^{n}$, then the highest order term (in polynomial multiplication) is $m+n-2$ since $w$ represents a polynomial of degree m-1 (coefficents from $\alpha_0$ to $\alpha_{m-1}$) and $z$ represents a polynomial of degree n-1 (coefficents from $\alpha_0$ to $\alpha_{n-1}$), giving highest term $x^{m+n-2}$ but with the 0th term gives us a total of $m+n-1$ terms. Thus $T(w) \in \mathbb{R}^{(m+n-1) \times n}$. *NOTE* what this implies is that we should convolve the *larger vector* (take the Toeplitz of the larger vector) to make the Toepliz matrix smaller (skinnier). This is because if $n << m \rightarrow T(w) \in \mathbb{R}^{(m+n-1) \times n} < T(z) \in \mathbb{R}^{(m+n-1) \times m}$. ## Example: 1D Discrete Convolution Let the input signal be $$ z = [1,\,2,\,3,\,4] $$ and the kernel be $$ w = [2,\,1] $$ We use the discrete convolution definition $$ y(x) = (z * w)(x) = \sum_{a=-\infty}^{\infty} z(a)\,w(x-a). $$ Assume $z(a)=0$ and $w(a)=0$ outside their index ranges. Since the input has length 4 and the kernel has length 2, the output length is $4 + 2 - 1 = 5$ Therefore $x = 0,1,2,3,4$. ### Compute Each Output Value $x = 0$ $y(0) = z(0)w(0) = 1 \cdot 2 = 2$ $x = 1$ $ \begin{aligned} y(1) &= z(0)w(1) + z(1)w(0) \\ &= 1 \cdot 1 + 2 \cdot 2 \\ &= 1 + 4 \\ &= 5 \end{aligned} $ $x = 2$ $ \begin{aligned} y(2) &= z(1)w(1) + z(2)w(0) \\ &= 2 \cdot 1 + 3 \cdot 2 \\ &= 2 + 6 \\ &= 8 \end{aligned} $ $x = 3$ $ \begin{aligned} y(3) &= z(2)w(1) + z(3)w(0) \\ &= 3 \cdot 1 + 4 \cdot 2 \\ &= 3 + 8 \\ &= 11 \end{aligned} $ $x = 4$ $ y(4) = z(3)w(1) = 4 \cdot 1 = 4 $ Note that this is equivalent to the Toeplitz Matrix multiplication $$ y = T(w)z $$ such that $$ y = \begin{bmatrix} w_0 & 0 & 0 & 0 \\ w_1 & w_0 & 0 & 0 \\ 0 & w_1 & w_0 & 0 \\ 0 & 0 & w_1 & w_0 \\ 0 & 0 & 0 & w_1 \\ \end{bmatrix} \begin{bmatrix} z_0 \\ z_1 \\ z_2 \\ z_3 \\ \end{bmatrix} $$ The final result is $$ z * w = [2,\,5,\,8,\,11,\,4] $$ Comments (0) Please log in to comment. No comments yet. Be the first to comment! ← Back to Blog
Comments (0)
Please log in to comment.
No comments yet. Be the first to comment!