Derivatives of Functions vs. Derivatives of Functionals by dokuDoku Math 🔍 ## What is Calculus of Variations? In regular calculus we find $f'(x)$ or $\frac{\partial f(x)}{\partial x}$ to give us the rate of change of the function. This is usually done to find the max or min of said function $f(x)$, candidates which typically have $f'(x) = 0$. This point is usually denoted as the *optimal* for our problem $x_{opt}$. But what if instead of finding a specific point $x_{opt}$, want to find the optimal function in a family of functions, $f_{opt}$? In this setting, our input is no longer a point, but a function. This means we need a notion of a derivative with respect to a function, not just a number. This is the problem Calculus of Variations solves, where we find the stationary function of a functional (function of functions, see [Fun with Functionals](https://www.noteblogdoku.com/blog/fun-with-functionals)). The stationary function is defined such that given our function $f_{opt}$ in our family of functions, the derivative of our functional $F[f]$ equals zero at this function $f_{opt}$. ### Derivation of Functional Derivative Additionally, since in regular calculus we perturb the input $x$ with $dx$ to take the differential: $$ df = f(x+dx) -f(x) = f'(x)dx + o(dx)$$ Let's more formally rephrase $dx$ as $\epsilon h$, where $h$ is the direction of our perturbation ($-1$ or $1$) and $\epsilon$ is the scale of it $$ df = f(x+\epsilon h) -f(x) = \epsilon df_x(h) + o(\epsilon)$$ where $df_x$ is the linear map (the differential) and $df_x(h)$ is its application to direction $h$. For our functional we equivalently need to perturb the input function $f$ by some function direction $\eta$ with magnitude $\epsilon$, as we can see in the image at the top of the blog post. The image shows an arbitrary function $\eta(x)$ that is then scaled by $\epsilon \in [-1,1]$ to perturb the original function $y(x)$ in the image. Here $\eta(x)$ plays the role of a direction in function space, analogous to a vector $h \in \mathbb{R}^n$. It tells us how to perturb the function at every point $x$. More explicitly, to perturb $f$ by $\epsilon \eta$: $$f_{\epsilon} = f + \epsilon \eta$$ We are going to derive something similar in form to the one variable case above. Here we are going to define the following one variable function fixing the functions $f$ and $\eta$: $$\phi(\epsilon) := F[f + \epsilon \eta]$$ Using ordinary calculus on $\phi(\epsilon)$: $$ \phi(\epsilon) - \phi(0) = \phi'(0)\,\epsilon + o(\epsilon) $$ Substituting back $\phi(\epsilon) := F[f + \epsilon \eta]$, we get: $$ F[f + \epsilon \eta] - F[f] = \epsilon\, \phi'(0) + o(\epsilon) $$ Note that: $$ \phi'(0) = \left.\frac{d}{d\epsilon} \phi(\epsilon)\right|_{\epsilon=0} = \left.\frac{d}{d\epsilon} F[f + \epsilon \eta]\right|_{\epsilon=0} $$ Using the functional chain rule (see **Appendix**), we have: $$ \frac{d}{d\epsilon} F[f + \epsilon \eta] = dF[f + \epsilon \eta](\frac{d}{d\epsilon} (f + \epsilon \eta)) = dF[f + \epsilon \eta](\eta) $$ Evaluating at $\epsilon = 0$, we obtain: $$ \left.\frac{d}{d\epsilon} F[f + \epsilon \eta]\right|_{\epsilon=0} = dF[f](\eta) $$ Therefore: $$ F[f + \epsilon \eta] - F[f] = \epsilon\, dF[f](\eta) + o(\epsilon) $$ where $o(\epsilon)$ goes to 0, and is ignored, thus $$ F[f + \epsilon \eta] - F[f] = \epsilon\, dF[f](\eta) $$ Here $dF[f](\eta)$ is the differential of the function $F$ at $f$ applied to the perturbation direction $\eta$. It plays the same role as $f'(x)h$ in ordinary calculus. So just like in ordinary calculus where stationary points satisfy $f'(x) = 0$ in the calculus of variations optimal functions satisfy $dF[f](\eta) = 0$ for all perturbation functions $\eta$. Since $dF[f](\eta)$ is linear in $\eta$, we can represent it as an inner product: $$ dF[f](\eta) = \int \frac{\delta F}{\delta f}(x)\eta(x)\,dx $$ This defines the functional derivative $\frac{\delta F}{\delta f}(x)$ as the function that represents the linear map $dF[f]$. For a clearer explanation, see the **Calculus and Calculus of Variations Equivalence Table** where $F[f]$ is the *functional $F$ with input function $f$* and $f(x)$ is the function $f$ with vector input $x \in \mathbb{R}^n$. ### Calculus and Calculus of Variations Equivalence Table | | Calculus | Calculus of Variations | |------|--------|------------------------| | 1 | Points | Functions | | 2 | Stationary points $\frac{\partial f(x)}{\partial x} = 0$ | Stationary functions $dF[f](\eta) = 0$ for all $\eta$| | 3 | Function $f(x)$ | Functional $F[f]$ | | 4 | Point perturbation $dx = \epsilon h \in \mathbb{R}$ | Function perturbation $\epsilon \eta(x)$, $\epsilon \in \mathbb{R}$, $\eta(x) \in L^2$ | | 5 | Derivative $df_x(h) = f'(x)h$ | Differential $dF[f](\eta)$ | | 6 | $f(x+\epsilon h) - f(x) = \epsilon f'(x)h + o(\epsilon)$ | $F[f+\epsilon \eta] - F[f] = \epsilon dF[f](\eta) + o(\epsilon)$ | | 7 | Gradient $\nabla f(x)$ | Functional derivative $\frac{\delta F}{\delta f}(x)$ | | 8 | Dot product $\nabla f(x) \cdot h$ | Inner product $\int \frac{\delta F}{\delta f}(x)\eta(x)dx$ | | 9 | $\nabla f(x) = 0$ | $\frac{\delta F}{\delta f}(x) = 0$ | | 10 | Direction vector $h$ | Direction function $\eta(x)$ | ### Example Functional Derivative Let $$ F[f] = \int f(x)^2 dx $$ Then: $$ F[f + \epsilon \eta] = \int (f + \epsilon \eta)^2 dx = \int (f^2 + 2\epsilon f \eta + \epsilon^2 \eta^2) dx $$ So: $$ F[f + \epsilon \eta] - F[f] = \int \left(2\epsilon f(x)\eta(x) + \epsilon^2 \eta(x)^2 \right) dx $$ Factor out $\epsilon$: $$ = \epsilon \int 2f(x)\eta(x) dx + \epsilon^2 \int \eta(x)^2 dx $$ Now divide by $\epsilon$ and take the limit as $\epsilon \to 0$: $$ \lim_{\epsilon \to 0} \frac{F[f + \epsilon \eta] - F[f]}{\epsilon} = dF[f](\eta) = \int 2f(x)\eta(x) dx $$ Thus, comparing with the general form we derived $$ dF[f](\eta) = \int \frac{\delta F}{\delta f}(x)\eta(x)\,dx $$ we obtain: $$ \frac{\delta F}{\delta f}(x) = 2f(x) $$ --- ## Appendix ### Little $o$ Notation $o(x)$ In calculus and asymptotic analysis, $o(x)$ is part of little o notation. This describes how fast a function goes to $0$ relative to another function, specifically $o(x)$ corresponds to sublinear growth (relative to $x$). The core idea is that $$ f(x) = o(g(x)) \quad as\ x \to a$$ if $$lim_{x \to a} \frac{f(x)}{g(x)} = 0$$ What $o(x)$ specifically means is that if $$f(x) = o(x)\quad x \to 0$$ it means that $$\frac{f(x)}{x} \to 0$$ So $f(x)$ goes to $0$ faster than $x$. For example $x^2 = o(x)$ as $x \to 0$ because $\frac{x^2}{x} = x \to 0$ ### Functional Chain Rule Recall that from [Fun with Functionals](https://www.noteblogdoku.com/blog/fun-with-functionals), we can heuristically view functions as elements of an infinite dimensional vector space (ex. a Hilbert space like $L^2$). Great! Now we are going to motivate the *Functional Chain Rule* using the *Multi-variate Chain Rule* from Vector Calculus. Note that this is not a formal rigorous proof, this is more meant to build intuition & be explanatory. Assume the following for the multi-variate chain rule - $z = f(x_1, x_2, ... x_n)$ with $z \in \mathbb{R}$ - $x \in \mathbb{R}^n$ with $x_i = x_i(t)$ - $t \in \mathbb{R}$ Using the chain rule we get: $$ \frac{dz}{dt} = \frac{df}{dx}\frac{dx}{dt} = Df(x(t)) \frac{dx}{dt}$$ Note that with numerator Jacobian convention we get the following shapes: - $\frac{dz}{dt} \in \mathbb{R}$ - $\frac{df}{dx} \in \mathbb{R}^{1 \times n}$ - $\frac{dx}{dt} \in \mathbb{R}^{n \times 1}$ Hence we can see that the derivative acts as a linear map applied to the vector $\frac{dx}{dt}$. Also notice that we are taking a dot product between $\frac{df}{dx}$ and $\frac{dx}{dt}$, and we can rewrite this as the more familiar summation to show this explicitly $$ \frac{dz}{dt} = \sum_{i=1}^n \frac{\partial f}{\partial x_i} \cdot \frac{\partial x_i}{\partial t}$$ Great, now replace the finite dimensional space $\mathbb{R}^n$ with a function space such as $L^2(\Omega)$. So now $x(t)$ is a function, and $F[x]$ is a functional, with $z = F[x(t)]$. To now get $\frac{dz}{dt}$ the derivative $dF[x]$ is a linear functional that acts on perturbations of the function $x$, and the derivative of the function of $x$ with respect to $t$ at every point of x too. As we know, the inner product between functions is defined as the integral so we get the continuous version of above as: $$ \frac{dz}{dt} = \frac{dF[x(t)]}{dt} = \int \frac{\delta F}{\delta x}(s) \cdot \frac{\partial x(t,s)}{\partial t} ds$$ We can then see that the *Functional Chain Rule* is defined as $$ \frac{d}{dt} F[x(t)] = dF[x(t)]\left(\frac{dx}{dt}\right)$$ Where $dF[x(t)]$ is the differential of the functional $F$ taking in the function $x(t)$ as input. $dF[x(t)]$ is a linear operator acting on the derivative of the function $x$ with respect to t, $\frac{dx}{dt}$. ## What is Calculus of Variations? In regular calculus we find $f'(x)$ or $\frac{\partial f(x)}{\partial x}$ to give us the rate of change of the function. This is usually done to find the max or min of said function $f(x)$, candidates which typically have $f'(x) = 0$. This point is usually denoted as the *optimal* for our problem $x_{opt}$. But what if instead of finding a specific point $x_{opt}$, want to find the optimal function in a family of functions, $f_{opt}$? In this setting, our input is no longer a point, but a function. This means we need a notion of a derivative with respect to a function, not just a number. This is the problem Calculus of Variations solves, where we find the stationary function of a functional (function of functions, see [Fun with Functionals](https://www.noteblogdoku.com/blog/fun-with-functionals)). The stationary function is defined such that given our function $f_{opt}$ in our family of functions, the derivative of our functional $F[f]$ equals zero at this function $f_{opt}$. ### Derivation of Functional Derivative Additionally, since in regular calculus we perturb the input $x$ with $dx$ to take the differential: $$ df = f(x+dx) -f(x) = f'(x)dx + o(dx)$$ Let's more formally rephrase $dx$ as $\epsilon h$, where $h$ is the direction of our perturbation ($-1$ or $1$) and $\epsilon$ is the scale of it $$ df = f(x+\epsilon h) -f(x) = \epsilon df_x(h) + o(\epsilon)$$ where $df_x$ is the linear map (the differential) and $df_x(h)$ is its application to direction $h$. For our functional we equivalently need to perturb the input function $f$ by some function direction $\eta$ with magnitude $\epsilon$, as we can see in the image at the top of the blog post. The image shows an arbitrary function $\eta(x)$ that is then scaled by $\epsilon \in [-1,1]$ to perturb the original function $y(x)$ in the image. Here $\eta(x)$ plays the role of a direction in function space, analogous to a vector $h \in \mathbb{R}^n$. It tells us how to perturb the function at every point $x$. More explicitly, to perturb $f$ by $\epsilon \eta$: $$f_{\epsilon} = f + \epsilon \eta$$ We are going to derive something similar in form to the one variable case above. Here we are going to define the following one variable function fixing the functions $f$ and $\eta$: $$\phi(\epsilon) := F[f + \epsilon \eta]$$ Using ordinary calculus on $\phi(\epsilon)$: $$ \phi(\epsilon) - \phi(0) = \phi'(0)\,\epsilon + o(\epsilon) $$ Substituting back $\phi(\epsilon) := F[f + \epsilon \eta]$, we get: $$ F[f + \epsilon \eta] - F[f] = \epsilon\, \phi'(0) + o(\epsilon) $$ Note that: $$ \phi'(0) = \left.\frac{d}{d\epsilon} \phi(\epsilon)\right|_{\epsilon=0} = \left.\frac{d}{d\epsilon} F[f + \epsilon \eta]\right|_{\epsilon=0} $$ Using the functional chain rule (see **Appendix**), we have: $$ \frac{d}{d\epsilon} F[f + \epsilon \eta] = dF[f + \epsilon \eta](\frac{d}{d\epsilon} (f + \epsilon \eta)) = dF[f + \epsilon \eta](\eta) $$ Evaluating at $\epsilon = 0$, we obtain: $$ \left.\frac{d}{d\epsilon} F[f + \epsilon \eta]\right|_{\epsilon=0} = dF[f](\eta) $$ Therefore: $$ F[f + \epsilon \eta] - F[f] = \epsilon\, dF[f](\eta) + o(\epsilon) $$ where $o(\epsilon)$ goes to 0, and is ignored, thus $$ F[f + \epsilon \eta] - F[f] = \epsilon\, dF[f](\eta) $$ Here $dF[f](\eta)$ is the differential of the function $F$ at $f$ applied to the perturbation direction $\eta$. It plays the same role as $f'(x)h$ in ordinary calculus. So just like in ordinary calculus where stationary points satisfy $f'(x) = 0$ in the calculus of variations optimal functions satisfy $dF[f](\eta) = 0$ for all perturbation functions $\eta$. Since $dF[f](\eta)$ is linear in $\eta$, we can represent it as an inner product: $$ dF[f](\eta) = \int \frac{\delta F}{\delta f}(x)\eta(x)\,dx $$ This defines the functional derivative $\frac{\delta F}{\delta f}(x)$ as the function that represents the linear map $dF[f]$. For a clearer explanation, see the **Calculus and Calculus of Variations Equivalence Table** where $F[f]$ is the *functional $F$ with input function $f$* and $f(x)$ is the function $f$ with vector input $x \in \mathbb{R}^n$. ### Calculus and Calculus of Variations Equivalence Table | | Calculus | Calculus of Variations | |------|--------|------------------------| | 1 | Points | Functions | | 2 | Stationary points $\frac{\partial f(x)}{\partial x} = 0$ | Stationary functions $dF[f](\eta) = 0$ for all $\eta$| | 3 | Function $f(x)$ | Functional $F[f]$ | | 4 | Point perturbation $dx = \epsilon h \in \mathbb{R}$ | Function perturbation $\epsilon \eta(x)$, $\epsilon \in \mathbb{R}$, $\eta(x) \in L^2$ | | 5 | Derivative $df_x(h) = f'(x)h$ | Differential $dF[f](\eta)$ | | 6 | $f(x+\epsilon h) - f(x) = \epsilon f'(x)h + o(\epsilon)$ | $F[f+\epsilon \eta] - F[f] = \epsilon dF[f](\eta) + o(\epsilon)$ | | 7 | Gradient $\nabla f(x)$ | Functional derivative $\frac{\delta F}{\delta f}(x)$ | | 8 | Dot product $\nabla f(x) \cdot h$ | Inner product $\int \frac{\delta F}{\delta f}(x)\eta(x)dx$ | | 9 | $\nabla f(x) = 0$ | $\frac{\delta F}{\delta f}(x) = 0$ | | 10 | Direction vector $h$ | Direction function $\eta(x)$ | ### Example Functional Derivative Let $$ F[f] = \int f(x)^2 dx $$ Then: $$ F[f + \epsilon \eta] = \int (f + \epsilon \eta)^2 dx = \int (f^2 + 2\epsilon f \eta + \epsilon^2 \eta^2) dx $$ So: $$ F[f + \epsilon \eta] - F[f] = \int \left(2\epsilon f(x)\eta(x) + \epsilon^2 \eta(x)^2 \right) dx $$ Factor out $\epsilon$: $$ = \epsilon \int 2f(x)\eta(x) dx + \epsilon^2 \int \eta(x)^2 dx $$ Now divide by $\epsilon$ and take the limit as $\epsilon \to 0$: $$ \lim_{\epsilon \to 0} \frac{F[f + \epsilon \eta] - F[f]}{\epsilon} = dF[f](\eta) = \int 2f(x)\eta(x) dx $$ Thus, comparing with the general form we derived $$ dF[f](\eta) = \int \frac{\delta F}{\delta f}(x)\eta(x)\,dx $$ we obtain: $$ \frac{\delta F}{\delta f}(x) = 2f(x) $$ --- ## Appendix ### Little $o$ Notation $o(x)$ In calculus and asymptotic analysis, $o(x)$ is part of little o notation. This describes how fast a function goes to $0$ relative to another function, specifically $o(x)$ corresponds to sublinear growth (relative to $x$). The core idea is that $$ f(x) = o(g(x)) \quad as\ x \to a$$ if $$lim_{x \to a} \frac{f(x)}{g(x)} = 0$$ What $o(x)$ specifically means is that if $$f(x) = o(x)\quad x \to 0$$ it means that $$\frac{f(x)}{x} \to 0$$ So $f(x)$ goes to $0$ faster than $x$. For example $x^2 = o(x)$ as $x \to 0$ because $\frac{x^2}{x} = x \to 0$ ### Functional Chain Rule Recall that from [Fun with Functionals](https://www.noteblogdoku.com/blog/fun-with-functionals), we can heuristically view functions as elements of an infinite dimensional vector space (ex. a Hilbert space like $L^2$). Great! Now we are going to motivate the *Functional Chain Rule* using the *Multi-variate Chain Rule* from Vector Calculus. Note that this is not a formal rigorous proof, this is more meant to build intuition & be explanatory. Assume the following for the multi-variate chain rule - $z = f(x_1, x_2, ... x_n)$ with $z \in \mathbb{R}$ - $x \in \mathbb{R}^n$ with $x_i = x_i(t)$ - $t \in \mathbb{R}$ Using the chain rule we get: $$ \frac{dz}{dt} = \frac{df}{dx}\frac{dx}{dt} = Df(x(t)) \frac{dx}{dt}$$ Note that with numerator Jacobian convention we get the following shapes: - $\frac{dz}{dt} \in \mathbb{R}$ - $\frac{df}{dx} \in \mathbb{R}^{1 \times n}$ - $\frac{dx}{dt} \in \mathbb{R}^{n \times 1}$ Hence we can see that the derivative acts as a linear map applied to the vector $\frac{dx}{dt}$. Also notice that we are taking a dot product between $\frac{df}{dx}$ and $\frac{dx}{dt}$, and we can rewrite this as the more familiar summation to show this explicitly $$ \frac{dz}{dt} = \sum_{i=1}^n \frac{\partial f}{\partial x_i} \cdot \frac{\partial x_i}{\partial t}$$ Great, now replace the finite dimensional space $\mathbb{R}^n$ with a function space such as $L^2(\Omega)$. So now $x(t)$ is a function, and $F[x]$ is a functional, with $z = F[x(t)]$. To now get $\frac{dz}{dt}$ the derivative $dF[x]$ is a linear functional that acts on perturbations of the function $x$, and the derivative of the function of $x$ with respect to $t$ at every point of x too. As we know, the inner product between functions is defined as the integral so we get the continuous version of above as: $$ \frac{dz}{dt} = \frac{dF[x(t)]}{dt} = \int \frac{\delta F}{\delta x}(s) \cdot \frac{\partial x(t,s)}{\partial t} ds$$ We can then see that the *Functional Chain Rule* is defined as $$ \frac{d}{dt} F[x(t)] = dF[x(t)]\left(\frac{dx}{dt}\right)$$ Where $dF[x(t)]$ is the differential of the functional $F$ taking in the function $x(t)$ as input. $dF[x(t)]$ is a linear operator acting on the derivative of the function $x$ with respect to t, $\frac{dx}{dt}$. Comments (0) Please log in to comment. No comments yet. Be the first to comment! ← Back to Blog
Comments (0)
Please log in to comment.
No comments yet. Be the first to comment!