Convex Eval

4 Subdifferentials of Finite Convex Functions

4.1 The Subdifferential: Definitions and Interpretations

4.1.1 First definition: Directional derivative

Definition 4.1.1
#

(Directional Derivative) The directional derivative of \(f\) at \(x\) in the direction \(d\) is

\[ f'(x,d):=\lim _{\{ q(t):t\downarrow 0\} }=\inf \{ q(t):t{\gt}0\} . \tag {1.1.2} \]
Proposition 4.1.1
#

For fixed \(x\), the function \(f'(x,\cdot )\) is finite sublinear.

Proof

Let \(d_1,d_2\in \mathbb {R}^n\), and positive \(\alpha _1,\alpha _2\) with \(\alpha _1+\alpha _2=1\). From the convexity of \(f\):

\[ f\bigl(x+t(\alpha _1d_1+\alpha _2d_2)\bigr)-f(x)= \]
\[ f\bigl(\alpha _1(x+td_1)+\alpha _2(x+td_2)\bigr)-\alpha _1f(x)-\alpha _2f(x)\le \]
\[ \le \alpha _1\bigl[f(x+td_1)-f(x)\bigr]+\alpha _2\bigl[f(x+td_2)-f(x)\bigr]. \]

for all \(t\). Dividing by \(t{\gt}0\) and letting \(t\downarrow 0\), we obtain

\[ f'(x,\alpha _1d_1+\alpha _2d_2)\le \alpha _1f'(x,d_1)+\alpha _2f'(x,d_2) \]

which establishes the convexity of \(f'\) with respect to \(d\). Its positive homogeneity is clear: for \(\lambda {\gt}0\)

\[ f'(x,\lambda d)=\lim _{t\downarrow 0}\lambda \frac{f(x+\lambda td)-f(x)}{\lambda t} =\lambda \lim _{\tau \downarrow 0}\frac{f(x+\tau d)-f(x)}{\tau } =\lambda f'(x,d). \]

Finally suppose \(\| d\| =1\). As a finite convex function, \(f\) is Lipschitz continuous around \(x\) (Theorem B.3.1.2); in particular there exist \(\varepsilon {\gt}0\) and \(L{\gt}0\) such that

\[ |f(x+td)-f(x)|\le Lt\quad \text{for }0\le t\le \varepsilon . \]

Hence, \(|f'(x,d)|\le L\) and we conclude with positive homogeneity:

\[ |f'(x,d)|\le L\| d\| \qquad \text{for all }d\in \mathbb {R}^n. \tag {1.1.5} \]
Definition 4.1.2 Subdifferential I
#

The subdifferential \(\partial f(x)\) of \(f\) at \(x\) is the nonempty compact convex set of \(\mathbb {R}^n\) whose support function is \(f'(x,\cdot )\), i.e.

\[ \partial f(x):=\{ s\in \mathbb {R}^n:\; \langle s,d\rangle \le f'(x,d)\ \text{ for all } d\in \mathbb {R}^n\} .\tag {1.1.6} \]

A vector \(s\in \partial f(x)\) is called a subgradient of \(f\) at \(x\).

Proposition 4.1.2
#

The finite sublinear function \(d\mapsto \sigma (d):=f'(x,d)\) satisfies

\[ \sigma '(0,\delta )=f'(x,\delta )\qquad \text{for all }\delta \in \mathbb {R}^n; \tag {1.1.8} \]
\[ \sigma (\delta )=\sigma (0)+\sigma '(0,\delta )=\sigma '(0,\delta )\qquad \text{for all }\delta \in \mathbb {R}^n; \tag {1.1.9} \]
\[ \partial \sigma (0)=\partial f(x). \tag {1.1.10} \]
Proof

Because \(\sigma \) is positively homogeneous and \(\sigma (0)=0\),

\[ \frac{\sigma (t\delta )-\sigma (0)}{t}=\sigma (\delta )=f'(x,\delta )\qquad \text{for all }t{\gt}0. \]

This implies immediately (1.1.8) and (1.1.9). Then (1.1.10) follows from uniqueness of the supported set.

4.1.2 Second definition: Minorization by affine functions

Definition 4.1.3
#

(Subdifferential II) The subdifferential of \(f\) at \(x\) is the set of vectors \(s\in \mathbb {R}^n\) satisfying

\[ f(y)\ge f(x)+\langle s,y-x\rangle \qquad \text{for all }y\in \mathbb {R}^n. \tag {1.2.1} \]
Theorem 4.1.1
#

The definitions 1.1.4 and 1.2.1 are equivalent.

Proof

Let \(s\) satisfy (1.1.6), i.e.

\[ \langle s,d\rangle \le f'(x,d)\qquad \text{for all }d\in \mathbb {R}^n. \tag {1.2.2} \]

The second equality in (1.1.2) makes it clear that (1.2.2) is equivalent to

\[ \langle s,d\rangle \le \frac{f(x+td)-f(x)}{t}\qquad \text{for all }d\in \mathbb {R}^n\text{ and }t{\gt}0. \tag {1.2.3} \]

When \(d\) describes \(\mathbb {R}^n\) and \(t\) describes \(\mathbb {R}_+^*\), \(y:=x+td\) describes \(\mathbb {R}^n\) and we realize that (1.2.3) is just (1.2.1).

4.1.3 Geometric constructions and interpretations

Proposition 4.1.3
#
  1. A vector \(s\in \mathbb {R}^n\) is a subgradient of \(f\) at \(x\) if and only if \((s,-1)\in \mathbb {R}^n\times \mathbb {R}\) is normal to \(\operatorname {epi}f\) at \((x,f(x))\). In other words:

    \[ N_{\operatorname {epi}f}(x,f(x))=\{ (\lambda s,-\lambda ): s\in \partial f(x),\ \lambda \ge 0\} . \]
  2. The tangent cone to the set \(\operatorname {epi}f\) at \((x,f(x))\) is the epigraph of the directional-derivative function \(d\mapsto f'(x,d)\):

    \[ T_{\operatorname {epi}f}(x,f(x))=\{ (d,r): r\ge f'(x,d)\} . \]
(i)

Apply Definition A.5.2.3 to see that \((s,-1)\in N_{\operatorname {epi}f}(x,f(x))\) means

\[ \langle s,y-x\rangle +(-1)[r-f(x)]\le 0\qquad \text{for all }y\in \mathbb {R}^n\text{ and }r\ge f(y) \]

and the equivalence with (1.2.1) is clear. The formula follows since the set of normals forms a cone containing the origin.

[(ii)] The tangent cone to \(\operatorname {epi}f\) is the polar of the above normal cone, i.e. the set of \((d,r)\in \mathbb {R}^n\times \mathbb {R}\) such that

\[ \langle \lambda s,d\rangle +(-\lambda )r\le 0\qquad \text{for all }s\in \partial f(x)\text{ and }\lambda \ge 0. \]

Barring the trivial case \(\lambda =0\), we divide by \(\lambda {\gt}0\) to obtain

\[ r\ge \max \{ \langle s,d\rangle : s\in \partial f(x)\} =f'(x,d). \]
Lemma 4.1.1
#

For the convex function \(f:\mathbb {R}^n\to \mathbb {R}\) and the sublevel-set (1.3.1), we have

\[ T_{S_{f(x)} }(x)\subset \{ d:\; f'(x,d)\le 0\} . \tag {1.3.2} \]
Proof

Take arbitrary \(y\in S_{f(x)},\ t{\gt}0\), and set \(d:=t(y-x)\). Then, using the second equality in (1.1.2),

\[ 0\ge t[f(y)-f(x)]=\frac{f(x+d/t)-f(x)}{1/t}\ge f'(x,d). \]

So we have proved

\[ \mathbb {R}^+[S_{f(x)}-x]\subset \{ d:\; f'(x,d)\le 0\} \tag {1.3.3} \]

(note: the case \(d=0\) is covered since \(0\in S_{f(x)}-x\)).

Because \(f'(\cdot ,\cdot )\) is a closed function, the righthand set in (1.3.3) is closed. Knowing that \(T_{S_{f(x)}}(x)\) is the closure of the lefthand side in (1.3.3) (Proposition A.5.2.1), we deduce the result by taking the closure of both sides in (1.3.3).

Proposition 4.1.4
#

Let \(g:\mathbb {R}^n\to \mathbb {R}\) be convex and suppose that \(g(x_0){\lt}0\) for some \(x_0\in \mathbb {R}^n\). Then

\[ \operatorname {cl}\{ z: g(z){\lt}0\} =\{ z: g(z)\le 0\} ,\tag {1.3.4} \]
\[ \{ z: g(z){\lt}0\} =\operatorname {int}\{ z: g(z)\le 0\} .\tag {1.3.5} \]

It follows

\[ \operatorname {bd}\{ z: g(z)\le 0\} =\{ z: g(z)=0\} .\tag {1.3.6} \]
Proof

Because \(g\) is (lower semi-) continuous, the inclusion “\(\subset \)” automatically holds in (1.3.4). Conversely, let \(\bar z\) be arbitrary with \(g(\bar z)\le 0\) and, for \(k{\gt}0\), set

\[ z_k:=\tfrac {1}{k}x_0+(1-\tfrac {1}{k})\bar z. \]

By convexity of \(g\), \(g(z_k){\lt}0\), so (1.3.4) is established by letting \(k\to +\infty \).

Now, take the interior of both sides in (1.3.4). The “int cl” on the left is actually an “int” (Proposition A.2.1.8), and this “int”-operation is useless because \(g\) is (upper semi-) continuous: (1.3.5) is established.

Theorem 4.1.2
#

Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex and suppose \(0\notin \partial f(x)\). Then, \(S_{f}(x)\) being the sublevel-set (1.3.1),

\[ T_{S_{f}(x)}(x)=\{ d\in \mathbb {R}^n:\; f'(x,d)\le 0\} \tag {1.3.7} \]
\[ \operatorname {int}\big[T_{S_{f}(x)}(x)\big]=\{ d\in \mathbb {R}^n:\; f'(x,d){\lt}0\} \neq \varnothing .\tag {1.3.8} \]
Proof

From the very definition (1.1.6), our assumption means that \(f'(x,d){\lt}0\) for some \(d\), and (1.1.2) then implies that \(f(x+td){\lt}f(x)\) for \(t{\gt}0\) small enough: our \(d\) is of the form \((x+td-x)/t\) with \(x+td\in S_{f}(x)\) and we have proved

\[ \{ d:\; f'(x,d){\lt}0\} \subset \mathbb {R}^+[S_{f}(x)-x]\subset T_{S_{f}(x)}(x).\tag {1.3.9} \]

Now, we can apply (1.3.4) with \(g=f'(x,\cdot )\):

\[ \operatorname {cl}\{ d:\; f'(x,d){\lt}0\} =\{ d:\; f'(x,d)\le 0\} , \]

so (1.3.7) is proved by closing the sets in (1.3.9) and using (1.3.2). Finally, take the interior of both sides in (1.3.7) and apply (1.3.5) with \(g=f'(x,\cdot )\) to prove (1.3.8).

Theorem 4.1.3
#

Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex and suppose \(0\notin \partial f(x)\). Then a direction \(d\) is normal to \(S f(x)\) at \(x\) if and only if there is some \(t\ge 0\) and some \(s\in \partial f(x)\) such that \(d=ts\):

\[ N_{S f(x)}(x)=\mathbb {R}^+ \partial f(x). \]
Proof

Write (1.3.7) as

\[ T_{S f(x)}(x)=\{ d\in \mathbb {R}^n:\langle s,d\rangle \le 0\ \text{for all }s\in \partial f(x)\} =\{ d\in \mathbb {R}^n:\langle \lambda s,d\rangle \le 0\ \text{for all }\lambda \ge 0\ \text{and }s\in \partial f(x)\} =[\mathbb {R}^+\partial f(x)]^\circ . \]

The result follows by taking the polar cone of both sides, and observing that the assumption implies closedness of \(\mathbb {R}^+\partial f(x)\) (Proposition A.1.4.7):

\[ N_{S f(x)}(x)=\operatorname {cl}[\mathbb {R}^+\partial f(x)]=\mathbb {R}^+\partial f(x). \]

4.2 Local Properties of the Subdifferential

4.2.1 First-order developments

Lemma 4.2.1
#

Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex and \(x\in \mathbb {R}^n\). For any \(\varepsilon {\gt}0\), there exists \(\delta {\gt}0\) such that \(\| h\| \le \delta \) implies

\[ \big|f(x+h)-f(x)-f'(x,h)\big|\le \varepsilon \| h\| . \tag {2.1.1} \]
Proof

Suppose for contradiction that there is \(\varepsilon {\gt}0\) and a sequence \((h_k)\) with \(\| h_k\| =:t_k\le 1/k\) such that

\[ |f(x+h_k)-f(x)-f'(x,h_k)|{\gt}\varepsilon t_k\qquad \text{for }k=1,2,\dots \]

Extracting a subsequence if necessary, assume that \(h_k/t_k\to d\) for some \(d\) of norm \(1\). Then take a local Lipschitz constant \(L\) of \(f\) (see Remark 1.1.3) and expand:

\[ \varepsilon t_k{\lt}|f(x+h_k)-f(x)-f'(x,h_k)| \le |f(x+h_k)-f(x+t_kd)|+ \]
\[ \quad +|f(x+t_kd)-f(x)-f'(x,t_kd)|+|f'(x,t_kd)-f'(x,h_k)| \]
\[ \le 2L\| h_k-t_kd\| +|f(x+t_kd)-f(x)-t_kf'(x,d)|. \]

Divide by \(t_k{\gt}0\) and pass to the limit to obtain the contradiction \(\varepsilon \le 0\).

Corollary 4.2.1
#

Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. At any \(x\),

\[ f(x+h)=f(x)+\langle s,h\rangle +o(\| h\| ) \]

whenever one of the following equivalent properties holds:

\[ s\in \partial f(x)(h)\iff h\in N_{\partial f(x)}(s)\iff s=p_{\partial f(x)}(s+h). \]
Corollary 4.2.2
#

If the convex \(f\) is (Gâteaux) differentiable at \(x\), its only subgradient at \(x\) is its gradient \(\nabla f(x)\). Conversely, if \(\partial f(x)\) contains only one element \(s\), then \(f\) is (Fréchet) differentiable at \(x\), with \(\nabla f(x)=s\).

Proposition 4.2.1
#

Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. For all \(x\) and \(d\) in \(\mathbb {R}^n\), we have

\[ F_{\partial f(x)}(d)=\partial [f'(x,\cdot )](d). \]
Proof

If \(s\in \partial f(x)\) then \(f'(x,d')\ge \langle s,d'\rangle \) for all \(d'\in \mathbb {R}^n\), simply because \(f'(x,\cdot )\) is the support function of \(\partial f(x)\). If, in addition, \(\langle s,d\rangle =f'(x,d)\), we get

\[ f'(x,d')\ge f'(x,d)+\langle s,d'-d\rangle \qquad \text{for all }d'\in \mathbb {R}^n \tag {2.1.4} \]

which proves the inclusion \(F_{\partial f(x)}(d)\subset \partial [f'(x,\cdot )](d)\).

Conversely, let \(s\) satisfy (2.1.4). Set \(d'':=d'-d\) and deduce from subadditivity

\[ f'(x,d)+f'(x,d'')\ge f'(x,d')\ge f'(x,d)+\langle s,d''\rangle \qquad \text{for all }d''\in \mathbb {R}^n \]

which implies \(f'(x,\cdot )\ge \langle s,\cdot \rangle \), hence \(s\in \partial f(x)\). Also, putting \(d'=0\) in (2.1.4) shows that \(\langle s,d\rangle \ge f'(x,d)\). Altogether, we have \(s\in F_{\partial f(x)}(d)\).

Definition 4.2.1
#

A point \(x\) at which \(\partial f(x)\) has more than one element — i.e. at which \(f\) is not differentiable — is called a kink (or corner-point) of \(f\).

4.2.2 Minimality conditions

Theorem 4.2.1
#

For \(f:\mathbb {R}^n\to \mathbb {R}\) convex, the following three properties are equivalent:

  1. \(f\) is minimized at \(x\) over \(\mathbb {R}^n\), i.e., \(f(y)\ge f(x)\) for all \(y\in \mathbb {R}^n\);

  2. \(0\in \partial f(x)\);

  3. \(f'(x,d)\ge 0\) for all \(d\in \mathbb {R}^n\).

Proof

The equivalence (i) \(\Leftrightarrow \) (ii) [resp. (ii) \(\Leftrightarrow \) (iii)] is obvious from (1.2.1) [resp. (1.1.6)].

4.2.3 Mean-value theorems

Lemma 4.2.2
#

The subdifferential of \(\varphi \) defined by (2.3.1) is

\[ \partial \varphi (t)=\{ \; s,y-x\; :\; s\in \partial f(x_t)\; \} \]

or, more symbolically:

\[ \partial \varphi (t)=\langle \partial f(x_t),\, y-x\rangle . \]
Proof

In terms of right- and left-derivatives (see Theorem 0.6.3), we have

\[ D_{+}\varphi (t)=\lim _{\tau \downarrow 0}\frac{f(x_t+\tau (y-x))-f(x_t)}{\tau }=f'(x_t,y-x), \]
\[ D_{-}\varphi (t)=\lim _{\tau \uparrow 0}\frac{f(x_t+\tau (y-x))-f(x_t)}{\tau }=-f'(x_t,-(y-x)); \]

so, knowing that

\[ f'(x_t,y-x)=\max _{s\in \partial f(x_t)}\langle s,y-x\rangle , \]
\[ -\, f'(x_t,-(y-x))=\min _{s\in \partial f(x_t)}\langle s,y-x\rangle , \]

we obtain \(\partial \varphi (t):=[D_{-}\varphi (t),D_{+}\varphi (t)]=\{ \langle s,y-x\rangle : s\in \partial f(x)\} .\)

Theorem 4.2.2
#

Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. Given two points \(x\neq y\) in \(\mathbb {R}^n\), there exist \(t\in ]0,1[\) and \(s\in \partial f(x_t)\) such that

\[ f(y)-f(x)=\langle s,\, y-x\rangle . \tag {2.3.2} \]

In other words,

\[ f(y)-f(x)\in \bigcup _{t\in ]0,1[}\{ \langle \partial f(x_t),\, y-x\rangle \} . \]
Proof

Start from the function \(\varphi \) of (2.3.1) and, as usual in this context, consider the auxiliary function

\[ \psi (t):=\varphi (t)-\varphi (0)-t[\varphi (1)-\varphi (0)], \]

which is clearly convex. Computing directional derivatives gives easily \(\partial \psi (t)=\partial \varphi (t)-[\varphi (1)-\varphi (0)]\). Now \(\psi \) is continuous on \([0,1]\), it has been constructed so that \(\psi (0)=\psi (1)=0\), so it is minimal at some \(t\in ]0,1[\): at this \(t\), \(0\in \partial \psi (t)\) (Theorem 2.2.1). In view of Lemma 2.3.1, this means that there is \(s\in \partial f(x_t)\) such that

\[ \langle s,\, y-x\rangle =\varphi (1)-\varphi (0)=f(y)-f(x). \]
Theorem 4.2.3
#

Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. For \(x,y\in \mathbb {R}^n\),

\[ f(y)-f(x)=\int _0^1\langle \partial f(xt),\, y-x\rangle \, dt. \]

4.3 First Examples

4.4 Calculus Rules with Subdifferentials

4.4.1 Positive combinations of functions

Theorem 4.4.1
#

Let \(f_1,f_2\) be two convex functions from \(\mathbb {R}^n\) to \(\mathbb {R}\) and \(t_1,t_2\) be positive. Then

\[ \partial (t_1 f_1 + t_2 f_2)(x) = t_1\partial f_1(x) + t_2\partial f_2(x) \qquad \text{for all }x\in \mathbb {R}^n. \tag {4.1.1} \]
Proof

Apply Theorem C.3.3.2(i): \(t_1\partial f_1(x)+t_2\partial f_2(x)\) is a compact convex set whose support function is

\[ t_1 f_1'(x,\cdot ) + t_2 f_2'(x,\cdot ). \tag {4.1.2} \]

On the other hand, the support function of \(\partial (t_1 f_1 + t_2 f_2)(x)\) is by definition the directional derivative \((t_1 f_1 + t_2 f_2)'(x,\cdot )\) which, from elementary calculus, is just (4.1.2). Therefore the two (compact convex) sets in (4.1.1) coincide, since they have the same support function.

4.4.2 Pre-composition with an affine mapping

Theorem 4.4.2
#

Let \(A:\mathbb {R}^n\to \mathbb {R}^m\) be an affine mapping ( \(Ax = A_0x + b\), with \(A_0\) linear and \(b\in \mathbb {R}^m\)) and let \(g\) be a finite convex function on \(\mathbb {R}^m\). Then

\[ \partial (g\circ A)(x)=A_0^*\partial g(Ax)\qquad \text{for all }x\in \mathbb {R}^n. \tag {4.2.1} \]
Proof

Form the difference quotient giving rise to \((g\circ A)'(x,d)\) and use the relation \(A(x+td)=Ax+tA_0d\) to obtain

\[ (g\circ A)'(x,d)=g'(Ax,A_0d)\qquad \text{for all }d\in \mathbb {R}^n. \]

From Proposition C.3.3.3, the righthand side in the above equality is the support function of the convex compact set \(A_0^*\partial g(Ax)\).

4.4.3 Post-composition with increasing convex function of several variables

Theorem 4.4.3
#

Let \(f\), \(F\) and \(g\) be defined as above. For all \(x\in \mathbb {R}^n\),

\[ \partial (g\circ F)(x)=\big\{ \sum _{i=1}^m \rho ^i s_i :\; (\rho ^1,\dots ,\rho ^m)\in \partial g(F(x)), \; s_i\in \partial f_i(x)\ \text{ for } i=1,\dots ,m\big\} . \tag {4.3.1} \]
Preamble

Our aim is to show the formula via support functions, hence we need to establish the convexity and compactness of the righthand side in (4.3.1) – call it \(S\). Boundedness and closedness are easy, coming from the fact that a subdifferential (be it \(\partial g\) or \(\partial f_i\)) is bounded and closed. As for convexity, pick two points in \(S\) and form their convex combination

\[ s=\alpha \sum _{i=1}^m \rho ^i s_i + (1-\alpha )\sum _{i=1}^m \rho ^{\prime i} s_i' = \sum _{i=1}^m\big[\alpha \rho ^i s_i + (1-\alpha )\rho ^{\prime i}s_i'\big], \]

where \(\alpha \in [0,1]\). Remember that each \(\rho ^i\) and \(\rho ^{\prime i}\) is nonnegative and the above sum can be restricted to those terms such that \(\rho ^{\prime \prime i}:=\alpha \rho ^i+(1-\alpha )\rho ^{\prime i}{\gt}0\). Then we write each such term as

\[ \rho ^{\prime \prime i}\big[\frac{\alpha \rho ^i}{\rho ^{\prime \prime i}} s_i + \frac{(1-\alpha )\rho ^{\prime i}}{\rho ^{\prime \prime i}} s_i'\big]. \]

It suffices to observe that \(\rho ^{\prime \prime i}\in \partial g(F(x))\), so the bracketed expression is in \(\partial f_i(x)\); thus \(s\in S\).

[Step 1] Now let us compute the support function \(\sigma _S\) of \(S\). For \(d\in \mathbb {R}^n\), we denote by \(F'(x,d)\in \mathbb {R}^m\) the vector whose components are \(f_i'(x,d)\) and we proceed to prove

\[ \sigma _S(d)=g' (F(x),F'(x,d)). \tag {4.3.2} \]

For any \(s=\sum _{i=1}^m \rho ^i s_i\in S\), we write \(\langle s,d\rangle \) as

\[ \sum _{i=1}^m \rho ^i\langle s_i,d\rangle \le \sum _{i=1}^m \rho ^i f_i'(x,d) \le g'(F(x),F'(x,d)) ; \tag {4.3.3} \]

the first inequality uses \(\rho ^i\ge 0\) and the definition of \(f_i'(x,\cdot )=\sigma _{\partial f_i}(x)\); the second uses the definition \(g'(F(x),\cdot )=\sigma _{\partial g(F(x))}\).

On the other hand, the compactness of \(\partial g(F(x))\) implies the existence of an \(m\)-tuple \((\bar\rho ^i)\in \partial g(F(x))\) such that

\[ g'(F(x),F'(x,d))=\sum _{i=1}^m\bar{\rho }^i f_i'(x,d), \]

and the compactness of each \(\partial f_i(x)\) yields likewise an \(\bar{s}_i\in \partial f_i(x)\) such that

\[ f_i'(x,d)=\langle \bar{s}_i,d\rangle \quad \text{for }i=1,\dots ,m. \]

Altogether, we have exhibited an \(\bar{s}=\sum _{i=1}^m\bar{\rho }^i\bar{s}_i\in S\) such that equality holds in (4.3.3), so (4.3.2) is established.

[Step 2] It remains to prove that the support function (4.3.2) is really the directional derivative \((g\circ F)'(x,d)\). For \(t{\gt}0\), expand \(F(x+td)\), use the fact that \(g\) is locally Lipschitzian, and then expand \(g(F(x+td))\):

\[ g(F(x+td))=g(F(x)+tF'(x,d)+o(t))=g(F(x)+tF'(x,d))+o(t) = g(F(x))+tg'(F(x),F'(x,d))+o(t). \]

From there, it follows

\[ (g\circ F)'(x,d):=\lim _{t\downarrow 0}\frac{g(F(x+td))-g(F(x))}{t}=g'(F(x),F'(x,d)). \quad \square \]
Corollary 4.4.1
#

Let \(f_1,\dots ,f_m\) be \(m\) convex functions from \(\mathbb {R}^n\) to \(\mathbb {R}\) and define

\[ f := \max \{ f_1,\dots ,f_m\} . \]

Denoting by \(I(x) := \{ i : f_i(x) = f(x)\} \) the active index-set, we have

\[ \partial f(x) = \operatorname {co}\{ \bigcup _{i\in I(x)}\partial f_i(x)\} . \tag {4.3.4} \]
Proof

Take \(g(y)=\max \{ y^{1},\dots ,y^{m}\} \), whose subdifferential was computed in (3.7): \(\{ e_{i}\} \) denoting the canonical basis of \(\mathbb {R}^{m}\),

\[ \partial g(y)=\operatorname {co}\{ e_{i} : i\ \text{such that}\ y^{i}=g(y)\} . \]

Then, using the notation of Theorem 4.3.1, we write \(\partial g(F(x))\) as

\[ \left\{ (\rho ^{1},\dots ,\rho ^{m}):\ \rho ^{i}=0\ \text{for}\ i\notin I(x),\ \rho ^{i}\ge 0\ \text{for}\ i\in I(x),\ \sum _{i=1}^{m}\rho ^{i}=1\right\} , \]

and (4.3.1) gives

\[ \partial f(x)=\left\{ \sum _{i\in I(x)}\rho ^{i}\partial f_{i}(x):\ \rho ^{i}\ge 0\ \text{for}\ i\in I(x),\ \sum _{i\in I(x)}\rho ^{i}=1\right\} . \]

Remembering Example A.1.3.5, it suffices to recognize in the above expression the convex hull announced in (4.3.4).

4.4.4 Supremum of convex functions

Lemma 4.4.1
#

With the notation (4.4.1), (4.4.2),

\[ \partial f(x)\supset \overline{\operatorname {co}}\{ \partial f_j(x):\; j\in J(x)\} . \tag {4.4.3} \]
Proof

Take \(j\in J(x)\) and \(s\in \partial f_j(x)\); from the definition (1.2.1) of the subdifferential,

\[ f(y)\ge f_j(y)\ge f_j(x)+\langle s,y-x\rangle \qquad \text{for all }y\in \mathbb R^n, \]

so \(\partial f(x)\) contains \(\partial f_j(x)\). Being closed and convex, it also contains the closed convex hull appearing in (4.4.3).

Theorem 4.4.4
#

With the notation (4.4.1), (4.4.2), assume that \(J\) is a compact set (in some metric space), on which the functions \(j\mapsto f_j(x)\) are upper semi-continuous for each \(x\in \mathbb {R}^n\). Then

\[ \partial f(x)=\operatorname {co}\{ \cup \partial f_j(x):\; j\in J(x)\} . \tag {4.4.4} \]
Step 0

Our assumptions make \(J(x)\) nonempty and compact. Denote by \(S\) the curly bracketed set in (4.4.4); because of (4.4.3), \(S\) is bounded, let us check that it is closed. Take a sequence \((s_k)\subset S\) converging to \(s\); to each \(s_k\), we associate some \(j_k\in J(x)\) such that \(s_k\in \partial f_{j_k}(x)\), i.e.

\[ f_{j_k}(y)\ge f_{j_k}(x)+\langle s_k,y-x\rangle \qquad \text{for all }y\in \mathbb {R}^n. \]

Let \(k\to \infty \); extract a subsequence so that \(j_k\to j\in J(x)\); we have \(f_{j_k}(x)\equiv f(x)=:f_j(x)\); and by upper semi-continuity of the function \(f_{(\cdot )}(y)\), we obtain

\[ f_j(y)\ge \limsup f_{j_k}(y)\ge f_j(x)+\langle s,y-x\rangle \qquad \text{for all }y\in \mathbb {R}^n, \]

which shows \(s\in \partial f_j(x)\subset S\). Altogether, \(S\) is compact and its convex hull is also compact (Theorem A.1.4.3).

In view of Lemma 4.4.1, it suffices to prove the “\(\subset \)”-inclusion in (4.4.4); for this, we will establish the corresponding inequality between support functions which, in view of the calculus rule C.3.3.2(ii), is: for all \(d\in \mathbb {R}^n\),

\[ f'(x,d)\le \sigma _S(d)=\sup \{ f_j'(x,d):\; j\in J(x)\} . \tag {4.4.5} \]

[Step 1] Let \(\varepsilon {\gt}0\); from the definition (1.1.2) of \(f'(x,d)\),

\[ \frac{f(x+td)-f(x)}{t}{\gt}f'(x,d)-\varepsilon \qquad \text{for all }t{\gt}0. \tag {4.4.6} \]

For \(t{\gt}0\), set

\[ J_t:=\Big\{ j\in J:\; \frac{f_j(x+td)-f_j(x)}{t}\ge f'(x,d)-\varepsilon \Big\} . \]

The definition of \(f(x+td)\) shows with (4.4.6) that \(J_t\) is nonempty. Because \(J\) is compact and \(f_{(\cdot )}(x+td)\) is upper semi-continuous, \(J_t\) is visibly compact. Observe that \(J_t\) is a superlevel-set of the function

\[ 0{\lt}t\mapsto \frac{f_j(x+td)-f_j(x)}{t}+\frac{f_j(x)-f(x)}{t}, \]

which is nondecreasing: the first fraction is the slope of a convex function, and the second fraction has a nonpositive numerator. Thus, \(J_{t_1}\subset J_{t_2}\) for \(0{\lt}t_1\le t_2\).

[Step 2] By compactness, we deduce the existence of some \(j^*\in \bigcap _{t{\gt}0}J_t\) (for each \(\tau \in ]0,t]\), pick some \(j_\tau \in J_\tau \subset J_t\); take a cluster point for \(\tau \downarrow 0\): it is in \(J_t\)). We therefore have

\[ f_{j^*}(x+td)-f(x)\ge t\bigl[f'(x,d)-\varepsilon \bigr]\qquad \text{for all }t{\gt}0, \]

hence \(j^*\in J(x)\) (continuity of the convex function \(f_{j^*}\) for \(t\downarrow 0\)). In this inequality, we can replace \(f(x)\) by \(f_{j^*}(x)\), divide by \(t\) and let \(t\downarrow 0\) to obtain

\[ \sigma _S(d)\ge f'_{j^*}(x,d)\ge f'(x,d)-\varepsilon . \]

Since \(d\in \mathbb {R}^n\) and \(\varepsilon {\gt}0\) were arbitrary, (4.4.5) is established.

Corollary 4.4.2
#

The notation and assumptions are those of Theorem 4.4.2. Assume also that each \(f_j\) is differentiable; then

\[ \partial f(x)=\operatorname {co}\{ \nabla f_j(x):\; j\in J(x)\} . \]
Corollary 4.4.3
#

For some compact set \(Y\subset \mathbb {R}^p\), let \(g:\mathbb {R}^n\times Y\to \mathbb {R}\) be a function satisfying the following properties:

  • for each \(x\in \mathbb {R}^n\), \(g(x,\cdot )\) is upper semi-continuous;

  • for each \(y\in Y\), \(g(\cdot ,y)\) is convex and differentiable;

  • the function \(f:=\sup _{y\in Y} g(\cdot ,y)\) is finite-valued on \(\mathbb {R}^n\);

  • at some \(x\in \mathbb {R}^n\), \(g(x,\cdot )\) is maximized at a unique \(y(x)\in Y\).

Then \(f\) is differentiable at this \(x\), and its gradient is

\[ \nabla f(x)=\nabla _x g\bigl(x,y(x)\bigr) \tag {4.4.8} \]

(where \(\nabla _x g(x,y)\) denotes the gradient of the function \(g(\cdot ,y)\) at \(x\)).

4.4.5 Image of a function under a linear mapping

Theorem 4.4.5
#

With the notation (4.5.1), (4.5.2), assume \(A\) is surjective. Let \(x\) be such that \(Y(x)\) is nonempty. Then, for arbitrary \(y\in Y(x)\),

\[ \partial (Ag)(x)=\{ \, s\in \mathbb {R}^n:\; A^*s\in \partial g(y)\, \} =\bigl(A^*\bigr)^{-1}[\partial g(y)] \tag {4.5.3} \]

(and this set is thus independent of the particular optimal \(y\)).

Proof

By definition, \(s\in \partial (Ag)(x)\) if and only if \((Ag)(x')\ge (Ag)(x)+\langle s,x'-x\rangle \) for all \(x'\in \mathbb {R}^n\), which can be rewritten

\[ (Ag)(x')\ge g(y)+\langle s,x'-Ay\rangle \qquad \text{for all }x'\in \mathbb {R}^n \]

where \(y\) is arbitrary in \(Y(x)\). Furthermore, because \(A\) is surjective and by definition of \(Ag\), this last relation is equivalent to

\[ g(y')\ge g(y)+\langle s,Ay'-Ay\rangle = g(y)+\langle A^*s,y'-y\rangle \qquad \text{for all }y'\in \mathbb {R}^m \]

which means that \(A^*s\in \partial g(y)\).

Corollary 4.4.4
#

Make the assumptions of Theorem 4.5.1. If \(g\) is differentiable at some \(y\in Y(x)\), then \(Ag\) is differentiable at \(x\).

Proof

Surjectivity of \(A\) is equivalent to injectivity of \(A^*\): in (4.5.3), we have an equation in \(s\): \(A^*s=\nabla g(y)\), whose solution is unique, and is therefore \(\nabla (Ag)(x)\).

Corollary 4.4.5

Suppose that the subdifferential of \(g\) in (4.5.4) is associated with a scalar product \(\langle \! \langle \cdot ,\cdot \rangle \! \rangle \) preserving the structure of a product space:

\[ \langle \! \langle (x,y),(x',y')\rangle \! \rangle =\langle x,x'\rangle _n+\langle y,y'\rangle _m \qquad \text{for }x,x'\in \mathbb {R}^n\text{ and }y,y'\in \mathbb {R}^m. \]

At a given \(x\in \mathbb {R}^n\), take an arbitrary \(y\) solving (4.5.4). Then

\[ \partial f(x)=\{ \, s\in \mathbb {R}^n:\, (s,0)\in \partial _{(x,y)}g(x,y)\, \} . \]
Proof

With our notation, \(A^*s=(s,0)\) for all \(s\in \mathbb {R}^n\). It suffices to apply Theorem 4.5.1 (the symbol \(\partial _{(x,y)}g\) is used as a reminder that we are dealing with the subdifferential of \(g\) with respect to the variable \((\cdot ,\cdot )\in \mathbb {R}^n\times \mathbb {R}^m\)).

Corollary 4.4.6
#

Let \(f_1\) and \(f_2:\mathbb {R}^n\to \mathbb {R}\) be two convex functions minorized by a common affine function. For given \(x\), let \((y_1,y_2)\) be such that the inf-convolution is exact at \(x=y_1+y_2\), i.e.: \((f_1\mathbin {\square }f_2)(x)=f_1(y_1)+f_2(y_2)\). Then

\[ \partial (f_1\mathbin {\square }f_2)(x)=\partial f_1(y_1)\cap \partial f_2(y_2). \tag {4.5.6} \]
Proof

First observe that \(A^*s=(s,s)\). Also, apply Definition 1.2.1 to see that \((s_1,s_2)\in \partial g(y_1,y_2)\) if and only if \(s_1\in \partial f_1(y_1)\) and \(s_2\in \partial f_2(y_2)\). Then (4.5.6) is just the copy of (4.5.3) in the present context.

4.5 Further Examples

4.5.1 Largest eigenvalue of a symmetric matrix

4.5.2 Nested optimization

4.5.3 Best approximation of a convex function on a compact interval

Theorem 4.5.1
#

With the notations (5.3.1), (5.3.2), suppose \(\varphi _{0}\notin H\). A necessary and sufficient condition for \(\bar{x}=(\bar{\xi }^{1},\dots ,\bar{\xi }^{n})\in \mathbb {R}^{n}\) to minimize \(f\) of (5.3.1) is that, for some positive integer \(p\le n+1\), there exist \(p\) points \(t_{1},\dots ,t_{p}\) in \(T\), \(p\) integers \(\varepsilon _{1},\dots ,\varepsilon _{p}\) in \(\{ -1,+1\} \) and \(p\) positive numbers \(\alpha _{1},\dots ,\alpha _{p}\) such that

\[ \sum _{i=1}^{n}\xi _i^r\varphi _i(t_k)-\varphi _0(t_k)=\varepsilon _k f(\bar{x})\qquad \text{for }k=1,\dots ,p, \]
\[ \sum _{k=1}^{p}\alpha _k\varepsilon _k\varphi _i(t_k)=0\qquad \text{for }i=1,\dots ,n \]

(or equivalently: \(\displaystyle \sum _{k=1}^{p}\alpha _k\varepsilon _k\psi (t_k)=0\quad \text{for all }\psi \in H\)). \(\square \)

4.6 Subdifferential as a multifunction

4.6.1 Monotonicity property of the subdifferential

Proposition 4.6.1
#

The subdifferential mapping is monotone in the sense that, for all \(x_1\) and \(x_2\) in \(\mathbb {R}^n\),

\begin{equation} \tag {6.1.1} \langle s_2 - s_1, x_2 - x_1\rangle \ge 0\quad \text{for all } s_i\in \partial f(x_i),\; i=1,2. \end{equation}
1

Proof

The subgradient inequalities

\[ f(x_2)\ge f(x_1)+\langle s_1, x_2-x_1\rangle \quad \text{for all }s_1\in \partial f(x_1) \]
\[ f(x_1)\ge f(x_2)+\langle s_2, x_1-x_2\rangle \quad \text{for all }s_2\in \partial f(x_2) \]

give the result simply by addition.

Theorem 4.6.1
#

A necessary and sufficient for a convex function \(f:\mathbb {R}^n\to \mathbb {R}\) to be strongly convex with modulus \(c{\gt}0\) on a convex set \(C\) is: for all \(x_1,x_2\) in \(C\),

\begin{equation} \tag {6.1.3} f(x_2)\ge f(x_1)+\langle s,x_2-x_1\rangle +\frac{c}{2}\| x_2-x_1\| ^2\quad \text{for all }s\in \partial f(x_1), \end{equation}
2

or equivalently

\begin{equation} \tag {6.1.4} \langle s_2-s_1,x_2-x_1\rangle \ge c\| x_2-x_1\| ^2\quad \text{for all }s_i\in \partial f(x_i),\; i=1,2. \end{equation}
3

Proof

For \(x_1,x_2\) given in \(C\) and \(\alpha \in ]0,1[\), we will use the notation

\[ x^\alpha :=\alpha x_2+(1-\alpha )x_1=x_1+\alpha (x_2-x_1) \]

and we will prove (6.1.3) \(\Rightarrow \) (6.1.2) \(\Rightarrow \) (6.1.4) \(\Rightarrow \) (6.1.3).

[(6.1.3) \(\Rightarrow \) (6.1.2)] Write (6.1.3) with \(x_1\) replaced by \(x^\alpha \in C\): for \(s\in \partial f(x^\alpha )\),

\[ f(x_2)\ge f(x^\alpha )+\langle s,x_2-x^\alpha \rangle +\frac{c}{2}\| x_2-x^\alpha \| ^2, \]

or equivalently

\[ f(x_2)\ge f(x^\alpha )+(1-\alpha )\langle s,x_2-x_1\rangle +\frac{c}{2}(1-\alpha )^2\| x_2-x_1\| ^2. \]

Likewise,

\[ f(x_1)\ge f(x^\alpha )+\alpha \langle s,x_1-x_2\rangle +\frac{c}{2}\alpha ^2\| x_1-x_2\| ^2. \]

Multiply these last two inequalities by \(\alpha \) and \((1-\alpha )\) respectively, and add to obtain

\[ \alpha f(x_2)+(1-\alpha )f(x_1)\ge f(x^\alpha )+\frac{c}{2}\| x_2-x_1\| ^2\big[\alpha (1-\alpha )^2+(1-\alpha )\alpha ^2\big]. \]

Then realize after simplification that this is just (6.1.2).

[(6.1.2) \(\Rightarrow \) (6.1.4)] Write (6.1.2) as

\[ \frac{f(x^\alpha )-f(x_1)}{\alpha }+\frac{c}{2}(1-\alpha )\| x_2-x_1\| ^2\le f(x_2)-f(x_1) \]

and let \(\alpha \downarrow 0\) to obtain \(f'(x_1,x_2-x_1)+\tfrac {c}{2}\| x_2-x_1\| ^2\le f(x_2)-f(x_1)\), which implies (6.1.3). Then, copying (6.1.3) with \(x_1\) and \(x_2\) interchanged and adding yields (6.1.4) directly.

[(6.1.4) \(\Rightarrow \) (6.1.3)] Apply Theorem 2.3.4 to the one-dimensional convex function \(\mathbb {R}\ni \alpha \mapsto \varphi (\alpha ):=f(x^\alpha )\):

\begin{equation} \tag {6.1.5} f(x_2)-f(x_1)=\varphi (1)-\varphi (0)=\int _0^1\langle s^\alpha ,x_2-x_1\rangle \, d\alpha \end{equation}
4

where \(s^\alpha \in \partial f(x^\alpha )\) for \(\alpha \in [0,1]\). Then take \(s_1\) arbitrary in \(\partial f(x_1)\) and apply (6.1.4): \(\langle s^\alpha -s_1,x^\alpha -x_1\rangle \ge c\| x^\alpha -x_1\| ^2\) i.e., using the value of \(x^\alpha \),

\[ \alpha \langle s^\alpha ,x_2-x_1\rangle \ge \alpha \langle s_1,x_2-x_1\rangle +c\alpha ^2\| x_2-x_1\| ^2. \]

The result follows by using this inequality to minorize the integral in (6.1.5).

Proposition 4.6.2
#

A necessary and sufficient condition for a convex function \(f:\mathbb {R}^n\to \mathbb {R}\) to be strictly convex on a convex set \(C\) is: for all \(x_1,x_2\in C\) with \(x_2\neq x_1\),

\[ f(x_2){\gt}f(x_1)+\langle s,x_2-x_1\rangle \quad \text{for all } s\in \partial f(x_1) \]

or equivalently

\[ \langle s_2-s_1,x_2-x_1\rangle {\gt}0\quad \text{for all } s_i\in \partial f(x_i),\ i=1,2. \]
Proof

Copy the proof of Theorem 6.1.2 with \(c=0\) and the relevant “\( \ge \)”-signs replaced by strict inequalities. The only delicate point is in the [(6.1.2) \(\Rightarrow \) (6.1.4)]-stage: use monotonicity of the difference quotient.

4.6.2 Continuity properties of the subdifferential

Proposition 4.6.3
#

Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. The graph of its subdifferential mapping is closed in \(\mathbb {R}^n\times \mathbb {R}^n\).

Proof

Let \((x_k,s_k)\) be a sequence in \(\operatorname {gr}\partial f\) converging to \((x,s)\in \mathbb {R}^n\times \mathbb {R}^n\). We must prove that \((x,s)\in \operatorname {gr}\partial f\), which is easy. We have for all \(k\)

\[ f(y)\ge f(x_k)+\langle s_k,y-x_k\rangle \quad \text{for all }y\in \mathbb {R}^n; \]

pass to the limit on \(k\), using continuity of \(f\) and of the scalar product.

Proposition 4.6.4
#

The mapping \(\partial f\) is locally bounded, i.e. the image \(\partial f(B)\) of a bounded set \(B\subset \mathbb {R}^n\) is a bounded set in \(\mathbb {R}^n\).

Proof

For arbitrary \(x\) in \(B\) and \(s\neq 0\) in \(\partial f(x)\), the subgradient inequality implies in particular \(f(x+s/\| s\| )\ge f(x)+\| s\| \). On the other hand, \(f\) is Lipschitz-continuous on the bounded set \(B+B(0,1)\) (Theorem B.3.1.2). Hence \(\| s\| \le L\) for some \(L\).

Theorem 4.6.2
#

The subdifferential mapping of a convex function \(f:\mathbb {R}^n\to \mathbb {R}\) is outer semi-continuous at any \(x\in \mathbb {R}^n\), i.e.

\[ \forall \varepsilon {\gt}0,\ \exists \delta {\gt}0:\quad y\in B(x,\delta )\implies \partial f(y)\subset \partial f(x)+B(0,\varepsilon ). \tag {6.2.1} \]
Proof

Assume for contradiction that, at some \(x\), there are \(\varepsilon {\gt}0\) and a sequence \((x_k,s_k)_k\) with

\[ x_k\to x\quad \text{for }k\to \infty \qquad \text{and}\qquad s_k\in \partial f(x_k),\ s_k\not\in \partial f(x)+B(0,\varepsilon )\quad \text{for }k=1,2,\dots \tag {6.2.2} \]

A subsequence of the bounded \((s_k)\) (Proposition 6.2.2) converges to \(s\in \partial f(x)\) (Proposition 6.2.1). This contradicts (6.2.2), which implies \(s\not\in \partial f(x)+B(0,1/2\varepsilon )\).

Corollary 4.6.1
#

For \(f:\mathbb {R}^n\to \mathbb {R}\) convex, the function \(f'(\cdot ,d)\) is upper semicontinuous: at all \(x\in \mathbb {R}^n\),

\[ f'(x,d)=\limsup _{y\to x} f'(y,d)\qquad \text{for all }d\in \mathbb {R}^n. \]
Proof

Use Theorem 6.2.4, in conjunction with Proposition C.3.3.7.

Theorem 4.6.3
#

Let \((f_k)\) be a sequence of (finite) convex functions converging pointwise to \(f:\mathbb {R}^n\to \mathbb {R}\) and let \((x_k)\) converge to \(x\in \mathbb {R}^n\). For any \(\varepsilon {\gt}0\),

\[ \partial f_k(x_k)\subset \partial f(x)+B(0,\varepsilon )\qquad \text{for $k$ large enough.} \]
Proof

Let \(\varepsilon {\gt}0\) be given. Recall (Theorem B.3.1.4) that the pointwise convergence of \((f_k)\) to \(f\) implies its uniform convergence on every compact set of \(\mathbb {R}^n\).

First, we establish boundedness: for \(s_k\neq 0\) arbitrary in \(\partial f_k(x_k)\), we have

\[ f_k(x_k+s_k/\| s_k\| )\ge f_k(x_k)+\| s_k\| . \]

The uniform convergence of \((f_k)\) to \(f\) on \(B(x,2)\) implies for \(k\) large enough

\[ \| s_k\| \le f(x_k+s_k/\| s_k\| )-f(x_k)+\varepsilon , \]

and the Lipschitz property of \(f\) on \(B(x,2)\) ensures that \((s_k)\) is bounded.

Now suppose for contradiction that, for some infinite subsequence, there is some \(s_k\in \partial f_k(x_k)\) which is not in \(\partial f(x)+B(0,\varepsilon )\). Any cluster point of this \((s_k)\) — and there is at least one — is out of \(\partial f(x)+B(0,1/2\varepsilon )\). Yet, with \(y\) arbitrary in \(\mathbb {R}^n\), write

\[ f_k(y)\ge f_k(x_k)+\langle s_k,y-x_k\rangle \]

and pass to the limit (on a further subsequence such that \(s_k\to s\)): pointwise [resp. uniform] convergence of \((f_k)\) to \(f\) at \(y\) [resp. around \(x\)], and continuity of the scalar product give \(f(y)\ge f(x)+\langle s,y-x\rangle \). Because \(y\) was arbitrary, we obtain the contradiction \(s\in \partial f(x)\).

Corollary 4.6.2
#

Let \((f_k)\) be a sequence of (finite) differentiable convex functions converging pointwise to the differentiable \(f:\mathbb {R}^n\to \mathbb {R}\). Then \(\nabla f_k\) converges to \(\nabla f\) uniformly on every compact set of \(\mathbb {R}^n\).

Proof

Take \(S\) compact; suppose for contradiction that there exists \(\varepsilon {\gt}0\) and a sequence \((x_k)\subset S\) such that

\[ \| \nabla f_k(x_k)-\nabla f(x_k)\| {\gt}\varepsilon \quad \text{for }k=1,2,\dots \]

Extracting a subsequence if necessary, we may suppose \(x_k\to x\in S\); Theorem 6.2.7 assures that the sequences \((\nabla f_k(x_k))\) and \((\nabla f(x_k))\) both converge to \(\nabla f(x)\), implying \(0\ge \varepsilon \).

4.6.3 Subdifferentials and limits of subgradients

Theorem 4.6.4
#

Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. With the notation (6.3.1), \(\partial f(x)=\operatorname {co}\gamma f(x)\) for all \(x\in \mathbb {R}^n\).

Proposition 4.6.5

Let \(x\) and \(d\neq 0\) be given in \(\mathbb {R}^n\). For any sequence \((t_k,s_k,d_k)\subset \mathbb {R}_+^*\times \mathbb {R}^n\times \mathbb {R}^n\) satisfying

\[ t_k\downarrow 0,\quad s_k\in \partial f(x+t_k d_k),\quad d_k\to d,\qquad \text{for }k=1,2,\dots \]

and any cluster point \(s\) of \((s_k)\), there holds

\[ s\in \partial f(x)\qquad \text{and}\qquad \langle s,d\rangle =f'(x,d). \]
Proof

The first property comes from the results in §6.2. For the second, use the monotonicity of \(\partial f\):

\[ 0\le \langle s_k-s',x+t_k d_k-x\rangle = t_k\langle s_k-s',d_k\rangle \qquad \text{for all }s'\in \partial f(x). \]

Divide by \(t_k{\gt}0\) and pass to the limit to get \(f'(x,d)\le \langle s,d\rangle \). The converse inequality being trivial, the proof is complete.