The Euler–Lagrange Equation, Derived

by dokuDoku Math

🔍

Please read [Fun with Functionals](https://www.noteblogdoku.com/blog/fun-with-functionals) for a general introduction to Functionals, and [Derivatives of Functionals](https://www.noteblogdoku.com/blog/derivatives-of-functions-vs-derivatives-of-functionals) for how to perturb a function (in function space) inputted into a functional. These concepts will be used here.

## Euler Lagrange Derivation

The Euler-Lagrange equation is used to find extremal functions such that its functional is stationary. Concretely, if we have a problem such that we have an input function $y = f(x)$ to the functional 
$$ I[y] = \int_{x_1}^{x_2} F(x,y,y') dx$$
We want to find the function $y$ for which $I[y]$ is stationary. Note that $y(x)$ is denoted as an *extremal* since like in single variable calculus, an *extremum* is a max or min *point*, an *extremal* is a max or min *function*. Additionally, problems of these forms are common so the expression: $F(x,y,y')$ is denoted as the *Lagrangian*.

### Derivation 
$$ I[y] = \int_{x_1}^{x_2} F(x,y,y') dx$$
Suppose that $y(x)$ makes $I[y]$ stationary ($y$ is an extremal function) satisfying the boundary conditions $y(x_1) = y_1$, $y(x_2) = y_2$. We are going to introduce the function $\eta(x)$ such that $\eta(x_1) = \eta(x_2) = 0$. $\eta(x)$ is an arbitrary function that's a direction in function space to perturb $y(x)$ with magnitude $\epsilon \in \mathbb{R}$. Also note that all functions are implied to have continuous 2nd derivatives. Because the Euler-Lagrange equation is done specifically for the fixed-endpoint problem in the calculus of variations this is the key context that makes setting $\eta(a) = \eta(b) = 0$ required.

Thus we define $y_{\epsilon}$ as the perturbation of $y$ by $\epsilon \eta$
$$y_{\epsilon} = y + \epsilon \eta$$

Since $\eta(x_1) = \eta(x_2) = 0$, $y_{\epsilon}$ has the same boundary conditions as $y$, $y_{\epsilon}(x_1) = y(x_1)$ and $y_{\epsilon}(x_2) = y(x_2)$.

Note that since $y(x)$ is defined to be an extremal, the functional $I[y + \epsilon \eta]$ is stationary at $\epsilon = 0$. Thus,
$$\frac{d}{d\epsilon} I[y + \epsilon \eta]|_{\epsilon=0} = 0$$

If $I$ is differentiable and $y$ is the stationary point, then for any admissible variation $\eta$,
$$I[y + \epsilon \eta] = I[y] + o(\epsilon) \quad \text{as } \epsilon \to 0$$

Starting with $\frac{d}{d\epsilon} I[y + \epsilon \eta]|_{\epsilon=0} = 0$ we have

$$\frac{d}{d\epsilon} I[y + \epsilon \eta]|_{\epsilon=0} = 0$$
$$\frac{d}{d\epsilon} I[y + \epsilon \eta]|_{\epsilon=0} = \frac{d}{d\epsilon}|_{\epsilon=0} \int_{x_1}^{x_2} F(x,y_{\epsilon},y'_{\epsilon})dx = 0$$
$$ 0 = \int_{x_1}^{x_2} \frac{\partial}{\partial \epsilon}|_{\epsilon=0} F(x,y_{\epsilon},y'_{\epsilon})dx $$
invoking the chain rule we get
$$ = \int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y_{\epsilon}} \frac{\partial y_{\epsilon}}{\partial \epsilon} + \frac{\partial F}{\partial y'_{\epsilon}}\frac{\partial y'_{\epsilon}}{\partial \epsilon} \right]_{\epsilon=0} dx $$

Recall $y_{\epsilon}(x) = y(x) + \epsilon \eta(x)$ giving us
$$\frac{\partial y_{\epsilon}}{\partial \epsilon} = \eta(x)$$ 
and $y'_{\epsilon}(x) = y'(x) + \epsilon \eta'(x)$ giving us that
$$\frac{\partial y'_{\epsilon}}{\partial \epsilon} = \eta'(x)$$

Thus our previous statement simplifies to 
$$ \int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y_{\epsilon}} \eta(x) + \frac{\partial F}{\partial y'_{\epsilon}} \eta'(x) \right]_{\epsilon =0}dx = 
\int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y_{\epsilon}} \eta(x) \right]_{\epsilon = 0} dx + \int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y'_{\epsilon}} \eta'(x) \right]_{\epsilon =0}dx
$$

Now we are going to simplify $\int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y'_{\epsilon}} \eta'(x) \right]_{\epsilon =0}dx$ with integration by parts to get rid of the $\eta'(x)$. Using the general formula: 
$$ \int u dv = uv - \int v du $$
we set: 
- $u = \frac{\partial F}{\partial y'_{\epsilon}}$
- $du = \frac{d}{dx}\frac{\partial F}{\partial y'_{\epsilon}} dx$
- $dv = \eta'(x)dx$
- $v = \eta(x)$

Thus we get 
$$ \int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y'_{\epsilon}} \eta'(x) \right]_{\epsilon =0}dx = 
\frac{\partial F}{\partial y'_{\epsilon}}\eta(x)|_{x_1}^{x_2} - \int_{x_1}^{x_2} \frac{d}{dx} \frac{\partial F}{\partial y'_{\epsilon}}\eta(x)dx|_{\epsilon=0}$$

Since we have that $\eta(x_1) = \eta(x_2) = 0$ 
$$ \frac{\partial F}{\partial y'_{\epsilon}}\eta(x)|_{x_1}^{x_2} = 0$$
Thus 
$$ \int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y'_{\epsilon}} \eta'(x) \right]_{\epsilon =0}dx = - \int_{x_1}^{x_2} \frac{d}{dx} \frac{\partial F}{\partial y'_{\epsilon}}\eta(x)dx|_{\epsilon=0}$$

Circling back to previous equation we have 
$$
\int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y_{\epsilon}} \eta(x) \right]_{\epsilon = 0} dx + \int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y'_{\epsilon}} \eta'(x) \right]_{\epsilon =0}dx
$$
$$
= \int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y_{\epsilon}} \eta(x) \right]_{\epsilon = 0} dx -
\int_{x_1}^{x_2} \frac{d}{dx} \frac{\partial F}{\partial y'_{\epsilon}}\eta(x)dx|_{\epsilon=0}
$$
$$ = \int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y_{\epsilon}} - \frac{d}{dx}\frac{\partial F}{\partial y'_{\epsilon}}\right] \eta(x)dx|_{\epsilon=0} = 0$$

Evaluating at $\epsilon = 0$, we get that $y_{\epsilon} = y$ and $y'_{\epsilon} = y'$. Thus this simplifies to

$$ = \int_{x_1}^{x_2} \left[ \frac{\partial F}{\partial y} - \frac{d}{dx}\frac{\partial F}{\partial y'}\right] \eta(x)dx = 0$$

Using the Fundamental Lemma of the Calculus of Variations: $\int_{x_1}^{x_2} g(x)\eta(x)dx = 0$, $\forall \eta \rightarrow g(x) = 0$.

To set this equation to 0 for all arbitrary $\eta(x)$, we must set the other term to 0, thus we have 
$$ \frac{\partial F}{\partial y}-\frac{d}{dx}\frac{\partial F}{\partial y'} = 0$$

We have derived the Euler–Lagrange equation, which characterizes the stationary functions of the functional $I[y]$ via a necessary condition that any extremal $y$ must satisfy.

## Minimize the Arc Length Functional

Let's do a classic physics example that directly shows how by minimizing a functional we find the minimal function path. Suppose that the arc length functional is defined by
$$ S[y] = \int \sqrt{1 + (y')^2}dx \quad F(y,y') = \sqrt{1 + (y')^2}$$
Using the Euler-Lagrange result 
$$ \frac{\partial F}{\partial y} - \frac{d}{dx}\frac{\partial F}{\partial y'} = 0$$
We know that $\frac{\partial F}{\partial y} =0$ since there is no $y$ term, thus we want to find $\frac{d}{dx}\frac{\partial F}{\partial y'} = 0$.

$$ \frac{\partial F}{\partial y'} = \frac{1}{2} (1+(y')^2)^{-\frac{1}{2}} \cdot 2y' = \frac{y'}{\sqrt{1+(y')^2}}$$
To find $\frac{d}{dx}\frac{\partial F}{\partial y'} = 0$, recognize that for this quantity to be equal to zero that $\frac{\partial F}{\partial y'}$ must be equal to some constant $C$. Hence we get 
$$\frac{y'}{\sqrt{1+(y')^2}} = C$$
Solving for $y'$ we get the following. Note that we're going to denote constants by $C_i$ where $i$ is just an arbitrary numbering for said constant, since it'll be easier to just say $C_i$ instead of detailing how all the prev constant terms make that term (frankly we don't care what the constant equals anyway).

$$\frac{y'}{\sqrt{1+(y')^2}} = C$$
$$y' = C \sqrt{1+(y')^2} \rightarrow (y')^2 = C_1(1+(y')^2)$$
$$(y')^2 - C_1(y')^2 = C_1 \rightarrow C_2(y')^2 = C_1$$
$$(y')^2 = C_3 \rightarrow y' = \sqrt{C_3} = C_4$$
Thus we get that the slope $y' = \frac{dy}{dx}$ is equal to some constant. This means that the extremal function $y$ is defined as a straight line, remember the basic equation $y = mx+b$ where $m$ is a constant slope, which is what we have here.

## KL-Divergence

In a previous blog post, [KL-Divergence and Cross Entopy](https://www.noteblogdoku.com/blog/kl-divergence-and-cross-entropy) we had a pretty lengthy derivation of $\frac{\delta D_{KL}(P||Q)}{\delta Q}$ where $P$ is our ground truth distribution, and $Q$ is our predicted distribution with
$$ D_{KL}(P||Q) = \int P(x)(logP(x) - logQ(x))dx$$

Let's take another look at this using some of the tooling we developed here. Note that for the KL-Divergence we have the constraint that 
$$\int Q(x)dx = 1\quad Q(x) \geq 0,\ \forall x$$

To find the $Q$ that minimizes $D_{KL}(P||Q)$ (assume we know this has a minimum not maximum), we are finding 
$$min_Q D_{KL}(P||Q)\quad subject\ to\ \int Q(x)dx = 1$$

or with Lagrange Multipliers, stating this as the functional $I[Q]$
$$I[Q] = D_{KL}(P||Q) + \lambda \left(\int Q(x)dx -1\right)$$
$$ = \int P(x)(logP(x) - logQ(x))dx + \lambda \left(\int Q(x)dx -1\right)$$
$$ = \int \left( P(x)(logP(x) - logQ(x)) + \lambda Q(x) \right) dx -\lambda$$
$$I[Q] = \int \left( P(x) log\frac{P(x)}{Q(x)} + \lambda Q(x) \right) dx -\lambda$$

Great, now we can see this is in the form $I[y] = \int_{x_1}^{x_2} F(x,y,y') dx$, with 
$$ F(x,Q) = P(x) log(\frac{P(x)}{Q(x)}) + \lambda Q(x)$$

From the Euler-Lagrange Equation $ \frac{\partial F}{\partial y}-\frac{d}{dx}\frac{\partial F}{\partial y'} = 0$ we have 
$$ \frac{\partial F}{\partial Q} - \frac{d}{dx} \frac{\partial F}{\partial Q'} = 0$$

Great we have
$$ \frac{\partial F}{\partial Q'} = 0$$
$$ \frac{\partial F}{\partial Q} = \frac{\partial}{\partial Q} \left( P(x) log(\frac{P(x)}{Q(x)}) + \lambda Q(x) \right)$$
$$ \frac{\partial F}{\partial Q} = \frac{\partial}{\partial Q} \left( P(x)logP(x) - P(x)logQ(x) + \lambda Q(x) \right)$$
$$\frac{\partial F}{\partial Q} = \frac{\partial}{\partial Q} \left(- P(x)logQ(x) + \lambda Q(x) \right)$$
$$\frac{\partial F}{\partial Q} = - \frac{P(x)}{Q(x)} + \lambda$$

The stationary condition becomes 
$$- \frac{P(x)}{Q(x)} + \lambda = 0 \rightarrow Q(x) = \frac{P(x)}{\lambda}$$

To find $\lambda$ use the fact that $\int P(x)dx = \int Q(x) = 1$, and take the integral of both sides of $Q(x) = \frac{P(x)}{\lambda}$
$$ \int Q(x) dx = \int \frac{P(x)}{\lambda} dx $$
$$ 1 = \frac{1}{\lambda} 1 \rightarrow \lambda = 1$$

Thus the stationary condition is $Q(x) = P(x)$, which makes sense since to minimize the $D_{KL}(P||Q)$ you would want $P=Q$.

For the unconstrined KL functional $J[Q]$ since the integrand depends on $Q$ but not $Q'$, the Euler-Lagrange operator reduces to $\frac{\partial F}{\partial Q}$, which in this case equals the functional derivative. However, $\frac{\partial F}{\partial Q}$ and $\frac{-P(x)}{Q(x)} + \lambda$ is the functional derivative of the constrained functional.

We can see this since  
$$ J[Q] = D_{KL}(P||Q) = \int P(x) log(\frac{P(x)}{Q(x)}) dx$$
Since the integrand depends on $Q$ but not $Q'$ the functional derivative of the unconstrained functional is 
$$ \frac{\delta J}{\delta Q} = \frac{\partial}{\partial Q}\left(P(x) log(\frac{P(x)}{Q(x)})\right) = \frac{-P(x)}{Q(x)}$$

Note that from the constrained version with $\lambda$ that the $\lambda$ term does not change the gradient of $D_{KL}$ itself, it changes the gradient of the Lagrangian functional used for constrained optimization.