Derivatives of Functions vs. Derivatives of Functionals

by dokuDoku Math

🔍

## What is Calculus of Variations?

In regular calculus we find $f'(x)$ or $\frac{\partial f(x)}{\partial x}$ to give us the rate of change of the function. This is usually done to find the max or min of said function $f(x)$, candidates which typically have $f'(x) = 0$. This point is usually denoted as the *optimal* for our problem $x_{opt}$.

But what if instead of finding a specific point $x_{opt}$, want to find the optimal function in a family of functions, $f_{opt}$? In this setting, our input is no longer a point, but a function. This means we need a notion of a derivative with respect to a function, not just a number. This is the problem Calculus of Variations solves, where we find the stationary function of a functional (function of functions, see [Fun with Functionals](https://www.noteblogdoku.com/blog/fun-with-functionals)). The stationary function is defined such that given our function $f_{opt}$ in our family of functions, the derivative of our functional $F[f]$ equals zero at this function $f_{opt}$.

### Derivation of Functional Derivative

Additionally, since in regular calculus we perturb the input $x$ with $dx$ to take the differential:
$$ df = f(x+dx) -f(x) = f'(x)dx + o(dx)$$
Let's more formally rephrase $dx$ as $\epsilon h$, where $h$ is the direction of our perturbation ($-1$ or $1$) and $\epsilon$ is the scale of it
$$ df = f(x+\epsilon h) -f(x) = \epsilon df_x(h) + o(\epsilon)$$
where $df_x$ is the linear map (the differential) and $df_x(h)$ is its application to direction $h$.

For our functional we equivalently need to perturb the input function $f$ by some function direction $\eta$ with magnitude $\epsilon$, as we can see in the image at the top of the blog post. The image shows an arbitrary function $\eta(x)$ that is then scaled by $\epsilon \in [-1,1]$ to perturb the original function $y(x)$ in the image. Here $\eta(x)$ plays the role of a direction in function space, analogous to a vector $h \in \mathbb{R}^n$. It tells us how to perturb the function at every point $x$.

More explicitly, to perturb $f$ by $\epsilon \eta$: 
$$f_{\epsilon} = f + \epsilon \eta$$

We are going to derive something similar in form to the one variable case above. Here we are going to define the following one variable function fixing the functions $f$ and $\eta$: 
$$\phi(\epsilon) := F[f + \epsilon \eta]$$

Using ordinary calculus on $\phi(\epsilon)$:
$$
\phi(\epsilon) - \phi(0) = \phi'(0)\,\epsilon + o(\epsilon)
$$

Substituting back $\phi(\epsilon) := F[f + \epsilon \eta]$, we get:
$$
F[f + \epsilon \eta] - F[f] = \epsilon\, \phi'(0) + o(\epsilon)
$$

Note that:
$$
\phi'(0) = \left.\frac{d}{d\epsilon} \phi(\epsilon)\right|_{\epsilon=0}
= \left.\frac{d}{d\epsilon} F[f + \epsilon \eta]\right|_{\epsilon=0}
$$

Using the functional chain rule (see **Appendix**), we have:
$$
\frac{d}{d\epsilon} F[f + \epsilon \eta]
= dF[f + \epsilon \eta](\frac{d}{d\epsilon} (f + \epsilon \eta)) = dF[f + \epsilon \eta](\eta)
$$

Evaluating at $\epsilon = 0$, we obtain:
$$
\left.\frac{d}{d\epsilon} F[f + \epsilon \eta]\right|_{\epsilon=0}
= dF[f](\eta)
$$

Therefore:
$$
F[f + \epsilon \eta] - F[f]
=
\epsilon\, dF[f](\eta) + o(\epsilon)
$$
where $o(\epsilon)$ goes to 0, and is ignored, thus 
$$
F[f + \epsilon \eta] - F[f]
=
\epsilon\, dF[f](\eta)
$$

Here $dF[f](\eta)$ is the differential of the function $F$ at $f$ applied to the perturbation direction $\eta$. It plays the same role as $f'(x)h$ in ordinary calculus. So just like in ordinary calculus where stationary points satisfy $f'(x) = 0$ in the calculus of variations optimal functions satisfy $dF[f](\eta) = 0$ for all perturbation functions $\eta$.

Since $dF[f](\eta)$ is linear in $\eta$, we can represent it as an inner product:

$$
dF[f](\eta) = \int \frac{\delta F}{\delta f}(x)\eta(x)\,dx
$$

This defines the functional derivative $\frac{\delta F}{\delta f}(x)$ as the function that represents the linear map $dF[f]$.

For a clearer explanation, see the **Calculus and Calculus of Variations Equivalence Table** where $F[f]$ is the *functional $F$ with input function $f$* and $f(x)$ is the function $f$ with vector input $x \in \mathbb{R}^n$.

### Calculus and Calculus of Variations Equivalence Table

| | Calculus | Calculus of Variations |
|------|--------|------------------------|
| 1 | Points | Functions |
| 2 | Stationary points $\frac{\partial f(x)}{\partial x} = 0$ | Stationary functions $dF[f](\eta) = 0$ for all $\eta$|
| 3 | Function $f(x)$ | Functional $F[f]$ |
| 4 | Point perturbation $dx = \epsilon h \in \mathbb{R}$ | Function perturbation $\epsilon \eta(x)$, $\epsilon \in \mathbb{R}$, $\eta(x) \in L^2$ |
| 5 | Derivative $df_x(h) = f'(x)h$ | Differential $dF[f](\eta)$ |
| 6 | $f(x+\epsilon h) - f(x) = \epsilon f'(x)h + o(\epsilon)$ | $F[f+\epsilon \eta] - F[f] = \epsilon dF[f](\eta) + o(\epsilon)$ |
| 7 | Gradient $\nabla f(x)$ | Functional derivative $\frac{\delta F}{\delta f}(x)$ |
| 8 | Dot product $\nabla f(x) \cdot h$ | Inner product $\int \frac{\delta F}{\delta f}(x)\eta(x)dx$ |
| 9 | $\nabla f(x) = 0$ | $\frac{\delta F}{\delta f}(x) = 0$ |
| 10 | Direction vector $h$ | Direction function $\eta(x)$ |

### Example Functional Derivative 
Let
$$
F[f] = \int f(x)^2 dx
$$

Then:
$$
F[f + \epsilon \eta] = \int (f + \epsilon \eta)^2 dx
= \int (f^2 + 2\epsilon f \eta + \epsilon^2 \eta^2) dx
$$

So:
$$
F[f + \epsilon \eta] - F[f]
= \int \left(2\epsilon f(x)\eta(x) + \epsilon^2 \eta(x)^2 \right) dx
$$

Factor out $\epsilon$:
$$
= \epsilon \int 2f(x)\eta(x) dx + \epsilon^2 \int \eta(x)^2 dx
$$

Now divide by $\epsilon$ and take the limit as $\epsilon \to 0$:
$$
\lim_{\epsilon \to 0}
\frac{F[f + \epsilon \eta] - F[f]}{\epsilon}
= dF[f](\eta) = 
\int 2f(x)\eta(x) dx
$$

Thus, comparing with the general form we derived
$$
dF[f](\eta) = \int \frac{\delta F}{\delta f}(x)\eta(x)\,dx
$$

we obtain:
$$
\frac{\delta F}{\delta f}(x) = 2f(x)
$$

---
## Appendix

### Little $o$ Notation $o(x)$

In calculus and asymptotic analysis, $o(x)$ is part of little o notation. This describes how fast a function goes to $0$ relative to another function, specifically $o(x)$ corresponds to sublinear growth (relative to $x$).

The core idea is that
$$ f(x) = o(g(x)) \quad as\ x \to a$$
if 
$$lim_{x \to a} \frac{f(x)}{g(x)} = 0$$

What $o(x)$ specifically means is that if 
$$f(x) = o(x)\quad x \to 0$$
it means that 
$$\frac{f(x)}{x} \to 0$$

So $f(x)$ goes to $0$ faster than $x$. 
For example $x^2 = o(x)$ as $x \to 0$ because $\frac{x^2}{x} = x \to 0$

### Functional Chain Rule

Recall that from [Fun with Functionals](https://www.noteblogdoku.com/blog/fun-with-functionals), we can heuristically view functions as elements of an infinite dimensional vector space (ex. a Hilbert space like $L^2$). Great! Now we are going to motivate the *Functional Chain Rule* using the *Multi-variate Chain Rule* from Vector Calculus. Note that this is not a formal rigorous proof, this is more meant to build intuition & be explanatory.

Assume the following for the multi-variate chain rule
- $z = f(x_1, x_2, ... x_n)$ with $z \in \mathbb{R}$
- $x \in \mathbb{R}^n$ with $x_i = x_i(t)$
- $t \in \mathbb{R}$

Using the chain rule we get: 
$$ \frac{dz}{dt} = \frac{df}{dx}\frac{dx}{dt} = Df(x(t)) \frac{dx}{dt}$$

Note that with numerator Jacobian convention we get the following shapes:
- $\frac{dz}{dt} \in \mathbb{R}$
- $\frac{df}{dx} \in \mathbb{R}^{1 \times n}$
- $\frac{dx}{dt} \in \mathbb{R}^{n \times 1}$

Hence we can see that the derivative acts as a linear map applied to the vector $\frac{dx}{dt}$. Also notice that we are taking a dot product between $\frac{df}{dx}$ and $\frac{dx}{dt}$, and we can rewrite this as the more familiar summation to show this explicitly 
$$ \frac{dz}{dt} = \sum_{i=1}^n \frac{\partial f}{\partial x_i} \cdot \frac{\partial x_i}{\partial t}$$

Great, now replace the finite dimensional space $\mathbb{R}^n$ with a function space such as $L^2(\Omega)$. So now $x(t)$ is a function, and $F[x]$ is a functional, with $z = F[x(t)]$. To now get $\frac{dz}{dt}$ the derivative $dF[x]$ is a linear functional that acts on perturbations of the function $x$, and the derivative of the function of $x$ with respect to $t$ at every point of x too. As we know, the inner product between functions is defined as the integral so we get the continuous version of above as:

$$ \frac{dz}{dt} = \frac{dF[x(t)]}{dt} = \int \frac{\delta F}{\delta x}(s) \cdot \frac{\partial x(t,s)}{\partial t} ds$$

We can then see that the *Functional Chain Rule* is defined as 
$$ \frac{d}{dt} F[x(t)] = dF[x(t)]\left(\frac{dx}{dt}\right)$$

Where $dF[x(t)]$ is the differential of the functional $F$ taking in the function $x(t)$ as input. $dF[x(t)]$ is a linear operator acting on the derivative of the function $x$ with respect to t, $\frac{dx}{dt}$.