4 Subdifferentials of Finite Convex Functions
4.1 The Subdifferential: Definitions and Interpretations
4.1.1 First definition: Directional derivative
(Directional Derivative) The directional derivative of \(f\) at \(x\) in the direction \(d\) is
For fixed \(x\), the function \(f'(x,\cdot )\) is finite sublinear.
Let \(d_1,d_2\in \mathbb {R}^n\), and positive \(\alpha _1,\alpha _2\) with \(\alpha _1+\alpha _2=1\). From the convexity of \(f\):
for all \(t\). Dividing by \(t{\gt}0\) and letting \(t\downarrow 0\), we obtain
which establishes the convexity of \(f'\) with respect to \(d\). Its positive homogeneity is clear: for \(\lambda {\gt}0\)
Finally suppose \(\| d\| =1\). As a finite convex function, \(f\) is Lipschitz continuous around \(x\) (Theorem B.3.1.2); in particular there exist \(\varepsilon {\gt}0\) and \(L{\gt}0\) such that
Hence, \(|f'(x,d)|\le L\) and we conclude with positive homogeneity:
The subdifferential \(\partial f(x)\) of \(f\) at \(x\) is the nonempty compact convex set of \(\mathbb {R}^n\) whose support function is \(f'(x,\cdot )\), i.e.
A vector \(s\in \partial f(x)\) is called a subgradient of \(f\) at \(x\).
The finite sublinear function \(d\mapsto \sigma (d):=f'(x,d)\) satisfies
Because \(\sigma \) is positively homogeneous and \(\sigma (0)=0\),
This implies immediately (1.1.8) and (1.1.9). Then (1.1.10) follows from uniqueness of the supported set.
4.1.2 Second definition: Minorization by affine functions
(Subdifferential II) The subdifferential of \(f\) at \(x\) is the set of vectors \(s\in \mathbb {R}^n\) satisfying
The definitions 1.1.4 and 1.2.1 are equivalent.
Let \(s\) satisfy (1.1.6), i.e.
The second equality in (1.1.2) makes it clear that (1.2.2) is equivalent to
When \(d\) describes \(\mathbb {R}^n\) and \(t\) describes \(\mathbb {R}_+^*\), \(y:=x+td\) describes \(\mathbb {R}^n\) and we realize that (1.2.3) is just (1.2.1).
4.1.3 Geometric constructions and interpretations
A vector \(s\in \mathbb {R}^n\) is a subgradient of \(f\) at \(x\) if and only if \((s,-1)\in \mathbb {R}^n\times \mathbb {R}\) is normal to \(\operatorname {epi}f\) at \((x,f(x))\). In other words:
\[ N_{\operatorname {epi}f}(x,f(x))=\{ (\lambda s,-\lambda ): s\in \partial f(x),\ \lambda \ge 0\} . \]The tangent cone to the set \(\operatorname {epi}f\) at \((x,f(x))\) is the epigraph of the directional-derivative function \(d\mapsto f'(x,d)\):
\[ T_{\operatorname {epi}f}(x,f(x))=\{ (d,r): r\ge f'(x,d)\} . \]
Apply Definition A.5.2.3 to see that \((s,-1)\in N_{\operatorname {epi}f}(x,f(x))\) means
and the equivalence with (1.2.1) is clear. The formula follows since the set of normals forms a cone containing the origin.
[(ii)] The tangent cone to \(\operatorname {epi}f\) is the polar of the above normal cone, i.e. the set of \((d,r)\in \mathbb {R}^n\times \mathbb {R}\) such that
Barring the trivial case \(\lambda =0\), we divide by \(\lambda {\gt}0\) to obtain
For the convex function \(f:\mathbb {R}^n\to \mathbb {R}\) and the sublevel-set (1.3.1), we have
Take arbitrary \(y\in S_{f(x)},\ t{\gt}0\), and set \(d:=t(y-x)\). Then, using the second equality in (1.1.2),
So we have proved
(note: the case \(d=0\) is covered since \(0\in S_{f(x)}-x\)).
Because \(f'(\cdot ,\cdot )\) is a closed function, the righthand set in (1.3.3) is closed. Knowing that \(T_{S_{f(x)}}(x)\) is the closure of the lefthand side in (1.3.3) (Proposition A.5.2.1), we deduce the result by taking the closure of both sides in (1.3.3).
Let \(g:\mathbb {R}^n\to \mathbb {R}\) be convex and suppose that \(g(x_0){\lt}0\) for some \(x_0\in \mathbb {R}^n\). Then
It follows
Because \(g\) is (lower semi-) continuous, the inclusion “\(\subset \)” automatically holds in (1.3.4). Conversely, let \(\bar z\) be arbitrary with \(g(\bar z)\le 0\) and, for \(k{\gt}0\), set
By convexity of \(g\), \(g(z_k){\lt}0\), so (1.3.4) is established by letting \(k\to +\infty \).
Now, take the interior of both sides in (1.3.4). The “int cl” on the left is actually an “int” (Proposition A.2.1.8), and this “int”-operation is useless because \(g\) is (upper semi-) continuous: (1.3.5) is established.
Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex and suppose \(0\notin \partial f(x)\). Then, \(S_{f}(x)\) being the sublevel-set (1.3.1),
From the very definition (1.1.6), our assumption means that \(f'(x,d){\lt}0\) for some \(d\), and (1.1.2) then implies that \(f(x+td){\lt}f(x)\) for \(t{\gt}0\) small enough: our \(d\) is of the form \((x+td-x)/t\) with \(x+td\in S_{f}(x)\) and we have proved
Now, we can apply (1.3.4) with \(g=f'(x,\cdot )\):
so (1.3.7) is proved by closing the sets in (1.3.9) and using (1.3.2). Finally, take the interior of both sides in (1.3.7) and apply (1.3.5) with \(g=f'(x,\cdot )\) to prove (1.3.8).
Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex and suppose \(0\notin \partial f(x)\). Then a direction \(d\) is normal to \(S f(x)\) at \(x\) if and only if there is some \(t\ge 0\) and some \(s\in \partial f(x)\) such that \(d=ts\):
Write (1.3.7) as
The result follows by taking the polar cone of both sides, and observing that the assumption implies closedness of \(\mathbb {R}^+\partial f(x)\) (Proposition A.1.4.7):
4.2 Local Properties of the Subdifferential
4.2.1 First-order developments
Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex and \(x\in \mathbb {R}^n\). For any \(\varepsilon {\gt}0\), there exists \(\delta {\gt}0\) such that \(\| h\| \le \delta \) implies
Suppose for contradiction that there is \(\varepsilon {\gt}0\) and a sequence \((h_k)\) with \(\| h_k\| =:t_k\le 1/k\) such that
Extracting a subsequence if necessary, assume that \(h_k/t_k\to d\) for some \(d\) of norm \(1\). Then take a local Lipschitz constant \(L\) of \(f\) (see Remark 1.1.3) and expand:
Divide by \(t_k{\gt}0\) and pass to the limit to obtain the contradiction \(\varepsilon \le 0\).
Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. At any \(x\),
whenever one of the following equivalent properties holds:
If the convex \(f\) is (Gâteaux) differentiable at \(x\), its only subgradient at \(x\) is its gradient \(\nabla f(x)\). Conversely, if \(\partial f(x)\) contains only one element \(s\), then \(f\) is (Fréchet) differentiable at \(x\), with \(\nabla f(x)=s\).
Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. For all \(x\) and \(d\) in \(\mathbb {R}^n\), we have
If \(s\in \partial f(x)\) then \(f'(x,d')\ge \langle s,d'\rangle \) for all \(d'\in \mathbb {R}^n\), simply because \(f'(x,\cdot )\) is the support function of \(\partial f(x)\). If, in addition, \(\langle s,d\rangle =f'(x,d)\), we get
which proves the inclusion \(F_{\partial f(x)}(d)\subset \partial [f'(x,\cdot )](d)\).
Conversely, let \(s\) satisfy (2.1.4). Set \(d'':=d'-d\) and deduce from subadditivity
which implies \(f'(x,\cdot )\ge \langle s,\cdot \rangle \), hence \(s\in \partial f(x)\). Also, putting \(d'=0\) in (2.1.4) shows that \(\langle s,d\rangle \ge f'(x,d)\). Altogether, we have \(s\in F_{\partial f(x)}(d)\). □
A point \(x\) at which \(\partial f(x)\) has more than one element — i.e. at which \(f\) is not differentiable — is called a kink (or corner-point) of \(f\).
4.2.2 Minimality conditions
For \(f:\mathbb {R}^n\to \mathbb {R}\) convex, the following three properties are equivalent:
\(f\) is minimized at \(x\) over \(\mathbb {R}^n\), i.e., \(f(y)\ge f(x)\) for all \(y\in \mathbb {R}^n\);
\(0\in \partial f(x)\);
\(f'(x,d)\ge 0\) for all \(d\in \mathbb {R}^n\).
The equivalence (i) \(\Leftrightarrow \) (ii) [resp. (ii) \(\Leftrightarrow \) (iii)] is obvious from (1.2.1) [resp. (1.1.6)].
4.2.3 Mean-value theorems
The subdifferential of \(\varphi \) defined by (2.3.1) is
or, more symbolically:
In terms of right- and left-derivatives (see Theorem 0.6.3), we have
so, knowing that
we obtain \(\partial \varphi (t):=[D_{-}\varphi (t),D_{+}\varphi (t)]=\{ \langle s,y-x\rangle : s\in \partial f(x)\} .\)
Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. Given two points \(x\neq y\) in \(\mathbb {R}^n\), there exist \(t\in ]0,1[\) and \(s\in \partial f(x_t)\) such that
In other words,
Start from the function \(\varphi \) of (2.3.1) and, as usual in this context, consider the auxiliary function
which is clearly convex. Computing directional derivatives gives easily \(\partial \psi (t)=\partial \varphi (t)-[\varphi (1)-\varphi (0)]\). Now \(\psi \) is continuous on \([0,1]\), it has been constructed so that \(\psi (0)=\psi (1)=0\), so it is minimal at some \(t\in ]0,1[\): at this \(t\), \(0\in \partial \psi (t)\) (Theorem 2.2.1). In view of Lemma 2.3.1, this means that there is \(s\in \partial f(x_t)\) such that
Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. For \(x,y\in \mathbb {R}^n\),
4.3 First Examples
4.4 Calculus Rules with Subdifferentials
4.4.1 Positive combinations of functions
Let \(f_1,f_2\) be two convex functions from \(\mathbb {R}^n\) to \(\mathbb {R}\) and \(t_1,t_2\) be positive. Then
Apply Theorem C.3.3.2(i): \(t_1\partial f_1(x)+t_2\partial f_2(x)\) is a compact convex set whose support function is
On the other hand, the support function of \(\partial (t_1 f_1 + t_2 f_2)(x)\) is by definition the directional derivative \((t_1 f_1 + t_2 f_2)'(x,\cdot )\) which, from elementary calculus, is just (4.1.2). Therefore the two (compact convex) sets in (4.1.1) coincide, since they have the same support function.
4.4.2 Pre-composition with an affine mapping
Let \(A:\mathbb {R}^n\to \mathbb {R}^m\) be an affine mapping ( \(Ax = A_0x + b\), with \(A_0\) linear and \(b\in \mathbb {R}^m\)) and let \(g\) be a finite convex function on \(\mathbb {R}^m\). Then
Form the difference quotient giving rise to \((g\circ A)'(x,d)\) and use the relation \(A(x+td)=Ax+tA_0d\) to obtain
From Proposition C.3.3.3, the righthand side in the above equality is the support function of the convex compact set \(A_0^*\partial g(Ax)\). □
4.4.3 Post-composition with increasing convex function of several variables
Let \(f\), \(F\) and \(g\) be defined as above. For all \(x\in \mathbb {R}^n\),
Our aim is to show the formula via support functions, hence we need to establish the convexity and compactness of the righthand side in (4.3.1) – call it \(S\). Boundedness and closedness are easy, coming from the fact that a subdifferential (be it \(\partial g\) or \(\partial f_i\)) is bounded and closed. As for convexity, pick two points in \(S\) and form their convex combination
where \(\alpha \in [0,1]\). Remember that each \(\rho ^i\) and \(\rho ^{\prime i}\) is nonnegative and the above sum can be restricted to those terms such that \(\rho ^{\prime \prime i}:=\alpha \rho ^i+(1-\alpha )\rho ^{\prime i}{\gt}0\). Then we write each such term as
It suffices to observe that \(\rho ^{\prime \prime i}\in \partial g(F(x))\), so the bracketed expression is in \(\partial f_i(x)\); thus \(s\in S\).
[Step 1] Now let us compute the support function \(\sigma _S\) of \(S\). For \(d\in \mathbb {R}^n\), we denote by \(F'(x,d)\in \mathbb {R}^m\) the vector whose components are \(f_i'(x,d)\) and we proceed to prove
For any \(s=\sum _{i=1}^m \rho ^i s_i\in S\), we write \(\langle s,d\rangle \) as
the first inequality uses \(\rho ^i\ge 0\) and the definition of \(f_i'(x,\cdot )=\sigma _{\partial f_i}(x)\); the second uses the definition \(g'(F(x),\cdot )=\sigma _{\partial g(F(x))}\).
On the other hand, the compactness of \(\partial g(F(x))\) implies the existence of an \(m\)-tuple \((\bar\rho ^i)\in \partial g(F(x))\) such that
and the compactness of each \(\partial f_i(x)\) yields likewise an \(\bar{s}_i\in \partial f_i(x)\) such that
Altogether, we have exhibited an \(\bar{s}=\sum _{i=1}^m\bar{\rho }^i\bar{s}_i\in S\) such that equality holds in (4.3.3), so (4.3.2) is established.
[Step 2] It remains to prove that the support function (4.3.2) is really the directional derivative \((g\circ F)'(x,d)\). For \(t{\gt}0\), expand \(F(x+td)\), use the fact that \(g\) is locally Lipschitzian, and then expand \(g(F(x+td))\):
From there, it follows
Let \(f_1,\dots ,f_m\) be \(m\) convex functions from \(\mathbb {R}^n\) to \(\mathbb {R}\) and define
Denoting by \(I(x) := \{ i : f_i(x) = f(x)\} \) the active index-set, we have
Take \(g(y)=\max \{ y^{1},\dots ,y^{m}\} \), whose subdifferential was computed in (3.7): \(\{ e_{i}\} \) denoting the canonical basis of \(\mathbb {R}^{m}\),
Then, using the notation of Theorem 4.3.1, we write \(\partial g(F(x))\) as
and (4.3.1) gives
Remembering Example A.1.3.5, it suffices to recognize in the above expression the convex hull announced in (4.3.4).
4.4.4 Supremum of convex functions
With the notation (4.4.1), (4.4.2),
Take \(j\in J(x)\) and \(s\in \partial f_j(x)\); from the definition (1.2.1) of the subdifferential,
so \(\partial f(x)\) contains \(\partial f_j(x)\). Being closed and convex, it also contains the closed convex hull appearing in (4.4.3).
With the notation (4.4.1), (4.4.2), assume that \(J\) is a compact set (in some metric space), on which the functions \(j\mapsto f_j(x)\) are upper semi-continuous for each \(x\in \mathbb {R}^n\). Then
Our assumptions make \(J(x)\) nonempty and compact. Denote by \(S\) the curly bracketed set in (4.4.4); because of (4.4.3), \(S\) is bounded, let us check that it is closed. Take a sequence \((s_k)\subset S\) converging to \(s\); to each \(s_k\), we associate some \(j_k\in J(x)\) such that \(s_k\in \partial f_{j_k}(x)\), i.e.
Let \(k\to \infty \); extract a subsequence so that \(j_k\to j\in J(x)\); we have \(f_{j_k}(x)\equiv f(x)=:f_j(x)\); and by upper semi-continuity of the function \(f_{(\cdot )}(y)\), we obtain
which shows \(s\in \partial f_j(x)\subset S\). Altogether, \(S\) is compact and its convex hull is also compact (Theorem A.1.4.3).
In view of Lemma 4.4.1, it suffices to prove the “\(\subset \)”-inclusion in (4.4.4); for this, we will establish the corresponding inequality between support functions which, in view of the calculus rule C.3.3.2(ii), is: for all \(d\in \mathbb {R}^n\),
[Step 1] Let \(\varepsilon {\gt}0\); from the definition (1.1.2) of \(f'(x,d)\),
For \(t{\gt}0\), set
The definition of \(f(x+td)\) shows with (4.4.6) that \(J_t\) is nonempty. Because \(J\) is compact and \(f_{(\cdot )}(x+td)\) is upper semi-continuous, \(J_t\) is visibly compact. Observe that \(J_t\) is a superlevel-set of the function
which is nondecreasing: the first fraction is the slope of a convex function, and the second fraction has a nonpositive numerator. Thus, \(J_{t_1}\subset J_{t_2}\) for \(0{\lt}t_1\le t_2\).
[Step 2] By compactness, we deduce the existence of some \(j^*\in \bigcap _{t{\gt}0}J_t\) (for each \(\tau \in ]0,t]\), pick some \(j_\tau \in J_\tau \subset J_t\); take a cluster point for \(\tau \downarrow 0\): it is in \(J_t\)). We therefore have
hence \(j^*\in J(x)\) (continuity of the convex function \(f_{j^*}\) for \(t\downarrow 0\)). In this inequality, we can replace \(f(x)\) by \(f_{j^*}(x)\), divide by \(t\) and let \(t\downarrow 0\) to obtain
Since \(d\in \mathbb {R}^n\) and \(\varepsilon {\gt}0\) were arbitrary, (4.4.5) is established.
The notation and assumptions are those of Theorem 4.4.2. Assume also that each \(f_j\) is differentiable; then
For some compact set \(Y\subset \mathbb {R}^p\), let \(g:\mathbb {R}^n\times Y\to \mathbb {R}\) be a function satisfying the following properties:
for each \(x\in \mathbb {R}^n\), \(g(x,\cdot )\) is upper semi-continuous;
for each \(y\in Y\), \(g(\cdot ,y)\) is convex and differentiable;
the function \(f:=\sup _{y\in Y} g(\cdot ,y)\) is finite-valued on \(\mathbb {R}^n\);
at some \(x\in \mathbb {R}^n\), \(g(x,\cdot )\) is maximized at a unique \(y(x)\in Y\).
Then \(f\) is differentiable at this \(x\), and its gradient is
(where \(\nabla _x g(x,y)\) denotes the gradient of the function \(g(\cdot ,y)\) at \(x\)).
4.4.5 Image of a function under a linear mapping
With the notation (4.5.1), (4.5.2), assume \(A\) is surjective. Let \(x\) be such that \(Y(x)\) is nonempty. Then, for arbitrary \(y\in Y(x)\),
(and this set is thus independent of the particular optimal \(y\)).
By definition, \(s\in \partial (Ag)(x)\) if and only if \((Ag)(x')\ge (Ag)(x)+\langle s,x'-x\rangle \) for all \(x'\in \mathbb {R}^n\), which can be rewritten
where \(y\) is arbitrary in \(Y(x)\). Furthermore, because \(A\) is surjective and by definition of \(Ag\), this last relation is equivalent to
which means that \(A^*s\in \partial g(y)\).
Make the assumptions of Theorem 4.5.1. If \(g\) is differentiable at some \(y\in Y(x)\), then \(Ag\) is differentiable at \(x\).
Surjectivity of \(A\) is equivalent to injectivity of \(A^*\): in (4.5.3), we have an equation in \(s\): \(A^*s=\nabla g(y)\), whose solution is unique, and is therefore \(\nabla (Ag)(x)\).
Suppose that the subdifferential of \(g\) in (4.5.4) is associated with a scalar product \(\langle \! \langle \cdot ,\cdot \rangle \! \rangle \) preserving the structure of a product space:
At a given \(x\in \mathbb {R}^n\), take an arbitrary \(y\) solving (4.5.4). Then
With our notation, \(A^*s=(s,0)\) for all \(s\in \mathbb {R}^n\). It suffices to apply Theorem 4.5.1 (the symbol \(\partial _{(x,y)}g\) is used as a reminder that we are dealing with the subdifferential of \(g\) with respect to the variable \((\cdot ,\cdot )\in \mathbb {R}^n\times \mathbb {R}^m\)).
Let \(f_1\) and \(f_2:\mathbb {R}^n\to \mathbb {R}\) be two convex functions minorized by a common affine function. For given \(x\), let \((y_1,y_2)\) be such that the inf-convolution is exact at \(x=y_1+y_2\), i.e.: \((f_1\mathbin {\square }f_2)(x)=f_1(y_1)+f_2(y_2)\). Then
First observe that \(A^*s=(s,s)\). Also, apply Definition 1.2.1 to see that \((s_1,s_2)\in \partial g(y_1,y_2)\) if and only if \(s_1\in \partial f_1(y_1)\) and \(s_2\in \partial f_2(y_2)\). Then (4.5.6) is just the copy of (4.5.3) in the present context.
4.5 Further Examples
4.5.1 Largest eigenvalue of a symmetric matrix
4.5.2 Nested optimization
4.5.3 Best approximation of a convex function on a compact interval
With the notations (5.3.1), (5.3.2), suppose \(\varphi _{0}\notin H\). A necessary and sufficient condition for \(\bar{x}=(\bar{\xi }^{1},\dots ,\bar{\xi }^{n})\in \mathbb {R}^{n}\) to minimize \(f\) of (5.3.1) is that, for some positive integer \(p\le n+1\), there exist \(p\) points \(t_{1},\dots ,t_{p}\) in \(T\), \(p\) integers \(\varepsilon _{1},\dots ,\varepsilon _{p}\) in \(\{ -1,+1\} \) and \(p\) positive numbers \(\alpha _{1},\dots ,\alpha _{p}\) such that
(or equivalently: \(\displaystyle \sum _{k=1}^{p}\alpha _k\varepsilon _k\psi (t_k)=0\quad \text{for all }\psi \in H\)). \(\square \)
4.6 Subdifferential as a multifunction
4.6.1 Monotonicity property of the subdifferential
The subdifferential mapping is monotone in the sense that, for all \(x_1\) and \(x_2\) in \(\mathbb {R}^n\),
The subgradient inequalities
give the result simply by addition.
A necessary and sufficient for a convex function \(f:\mathbb {R}^n\to \mathbb {R}\) to be strongly convex with modulus \(c{\gt}0\) on a convex set \(C\) is: for all \(x_1,x_2\) in \(C\),
or equivalently
For \(x_1,x_2\) given in \(C\) and \(\alpha \in ]0,1[\), we will use the notation
and we will prove (6.1.3) \(\Rightarrow \) (6.1.2) \(\Rightarrow \) (6.1.4) \(\Rightarrow \) (6.1.3).
[(6.1.3) \(\Rightarrow \) (6.1.2)] Write (6.1.3) with \(x_1\) replaced by \(x^\alpha \in C\): for \(s\in \partial f(x^\alpha )\),
or equivalently
Likewise,
Multiply these last two inequalities by \(\alpha \) and \((1-\alpha )\) respectively, and add to obtain
Then realize after simplification that this is just (6.1.2).
[(6.1.2) \(\Rightarrow \) (6.1.4)] Write (6.1.2) as
and let \(\alpha \downarrow 0\) to obtain \(f'(x_1,x_2-x_1)+\tfrac {c}{2}\| x_2-x_1\| ^2\le f(x_2)-f(x_1)\), which implies (6.1.3). Then, copying (6.1.3) with \(x_1\) and \(x_2\) interchanged and adding yields (6.1.4) directly.
[(6.1.4) \(\Rightarrow \) (6.1.3)] Apply Theorem 2.3.4 to the one-dimensional convex function \(\mathbb {R}\ni \alpha \mapsto \varphi (\alpha ):=f(x^\alpha )\):
where \(s^\alpha \in \partial f(x^\alpha )\) for \(\alpha \in [0,1]\). Then take \(s_1\) arbitrary in \(\partial f(x_1)\) and apply (6.1.4): \(\langle s^\alpha -s_1,x^\alpha -x_1\rangle \ge c\| x^\alpha -x_1\| ^2\) i.e., using the value of \(x^\alpha \),
The result follows by using this inequality to minorize the integral in (6.1.5).
A necessary and sufficient condition for a convex function \(f:\mathbb {R}^n\to \mathbb {R}\) to be strictly convex on a convex set \(C\) is: for all \(x_1,x_2\in C\) with \(x_2\neq x_1\),
or equivalently
Copy the proof of Theorem 6.1.2 with \(c=0\) and the relevant “\( \ge \)”-signs replaced by strict inequalities. The only delicate point is in the [(6.1.2) \(\Rightarrow \) (6.1.4)]-stage: use monotonicity of the difference quotient.
4.6.2 Continuity properties of the subdifferential
Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. The graph of its subdifferential mapping is closed in \(\mathbb {R}^n\times \mathbb {R}^n\).
Let \((x_k,s_k)\) be a sequence in \(\operatorname {gr}\partial f\) converging to \((x,s)\in \mathbb {R}^n\times \mathbb {R}^n\). We must prove that \((x,s)\in \operatorname {gr}\partial f\), which is easy. We have for all \(k\)
pass to the limit on \(k\), using continuity of \(f\) and of the scalar product.
The mapping \(\partial f\) is locally bounded, i.e. the image \(\partial f(B)\) of a bounded set \(B\subset \mathbb {R}^n\) is a bounded set in \(\mathbb {R}^n\).
For arbitrary \(x\) in \(B\) and \(s\neq 0\) in \(\partial f(x)\), the subgradient inequality implies in particular \(f(x+s/\| s\| )\ge f(x)+\| s\| \). On the other hand, \(f\) is Lipschitz-continuous on the bounded set \(B+B(0,1)\) (Theorem B.3.1.2). Hence \(\| s\| \le L\) for some \(L\).
The subdifferential mapping of a convex function \(f:\mathbb {R}^n\to \mathbb {R}\) is outer semi-continuous at any \(x\in \mathbb {R}^n\), i.e.
Assume for contradiction that, at some \(x\), there are \(\varepsilon {\gt}0\) and a sequence \((x_k,s_k)_k\) with
A subsequence of the bounded \((s_k)\) (Proposition 6.2.2) converges to \(s\in \partial f(x)\) (Proposition 6.2.1). This contradicts (6.2.2), which implies \(s\not\in \partial f(x)+B(0,1/2\varepsilon )\).
For \(f:\mathbb {R}^n\to \mathbb {R}\) convex, the function \(f'(\cdot ,d)\) is upper semicontinuous: at all \(x\in \mathbb {R}^n\),
Use Theorem 6.2.4, in conjunction with Proposition C.3.3.7.
Let \((f_k)\) be a sequence of (finite) convex functions converging pointwise to \(f:\mathbb {R}^n\to \mathbb {R}\) and let \((x_k)\) converge to \(x\in \mathbb {R}^n\). For any \(\varepsilon {\gt}0\),
Let \(\varepsilon {\gt}0\) be given. Recall (Theorem B.3.1.4) that the pointwise convergence of \((f_k)\) to \(f\) implies its uniform convergence on every compact set of \(\mathbb {R}^n\).
First, we establish boundedness: for \(s_k\neq 0\) arbitrary in \(\partial f_k(x_k)\), we have
The uniform convergence of \((f_k)\) to \(f\) on \(B(x,2)\) implies for \(k\) large enough
and the Lipschitz property of \(f\) on \(B(x,2)\) ensures that \((s_k)\) is bounded.
Now suppose for contradiction that, for some infinite subsequence, there is some \(s_k\in \partial f_k(x_k)\) which is not in \(\partial f(x)+B(0,\varepsilon )\). Any cluster point of this \((s_k)\) — and there is at least one — is out of \(\partial f(x)+B(0,1/2\varepsilon )\). Yet, with \(y\) arbitrary in \(\mathbb {R}^n\), write
and pass to the limit (on a further subsequence such that \(s_k\to s\)): pointwise [resp. uniform] convergence of \((f_k)\) to \(f\) at \(y\) [resp. around \(x\)], and continuity of the scalar product give \(f(y)\ge f(x)+\langle s,y-x\rangle \). Because \(y\) was arbitrary, we obtain the contradiction \(s\in \partial f(x)\).
Let \((f_k)\) be a sequence of (finite) differentiable convex functions converging pointwise to the differentiable \(f:\mathbb {R}^n\to \mathbb {R}\). Then \(\nabla f_k\) converges to \(\nabla f\) uniformly on every compact set of \(\mathbb {R}^n\).
Take \(S\) compact; suppose for contradiction that there exists \(\varepsilon {\gt}0\) and a sequence \((x_k)\subset S\) such that
Extracting a subsequence if necessary, we may suppose \(x_k\to x\in S\); Theorem 6.2.7 assures that the sequences \((\nabla f_k(x_k))\) and \((\nabla f(x_k))\) both converge to \(\nabla f(x)\), implying \(0\ge \varepsilon \).
4.6.3 Subdifferentials and limits of subgradients
Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex. With the notation (6.3.1), \(\partial f(x)=\operatorname {co}\gamma f(x)\) for all \(x\in \mathbb {R}^n\).
Let \(x\) and \(d\neq 0\) be given in \(\mathbb {R}^n\). For any sequence \((t_k,s_k,d_k)\subset \mathbb {R}_+^*\times \mathbb {R}^n\times \mathbb {R}^n\) satisfying
and any cluster point \(s\) of \((s_k)\), there holds
The first property comes from the results in §6.2. For the second, use the monotonicity of \(\partial f\):
Divide by \(t_k{\gt}0\) and pass to the limit to get \(f'(x,d)\le \langle s,d\rangle \). The converse inequality being trivial, the proof is complete.