2 Convex functions
2.1 Basic Definitions and Examples
2.1.1 The definitions of a Convex Function
Let \(C\) be a nonempty convex set in \(\mathbb {R}^n\). A function \(f : C \to \mathbb {R}\) is said to be convex on \(C\) when, for all pairs \((x,x')\in C\times C\) and all \(\alpha \in [0,1]\), there holds
The function \(f\) is strongly convex on \(C\) with modulus \(c\) if and only if the function \(f-\tfrac {1}{2}c\| \cdot \| ^{2}\) is convex on \(C\).
Use direct calculations in the definition (1.1.1) of convexity applied to the function \(f-\tfrac {1}{2}c\| \cdot \| ^{2}\), namely:
(The Set \(\operatorname {Conv}\mathbb {R}^n\)) A function \(f:\mathbb {R}^n\to \mathbb {R}\cup \{ +\infty \} \), not identically \(+\infty \), is said to be convex when, for all \((x,x')\in \mathbb {R}^n\times \mathbb {R}^n\) and all \(\alpha \in [0,1]\), there holds
considered as an inequality in \(\mathbb {R}\cup \{ +\infty \} \).
The class of such functions is denoted by \(\operatorname {Conv}\mathbb {R}^n\).
(Domain of a Function) The domain (or also effective domain) of \(f\in \text{Conv }\mathbb {R}^n\) is the nonempty set
(Epigraph of a Function) Given \(f:\mathbb {R}^n\to \mathbb {R}\cup \{ +\infty \} \), not identically equal to \(+\infty \), the epigraph of \(f\) is the nonempty set
Its strict epigraph \(\operatorname {epi}_s f\) is defined likewise, with “\(\ge \)” replaced by “\({\gt}\)” (beware that the word “strict” here has nothing to do with strict convexity).
Let \(f:\mathbb {R}^n\to \mathbb {R}\cup \{ +\infty \} \) be not identically equal to \(+\infty \). The three properties below are equivalent:
\(f\) is convex in the sense of Definition 1.1.3;
its epigraph is a convex set in \(\mathbb {R}^n\times \mathbb {R}\);
its strict epigraph is a convex set in \(\mathbb {R}^n\times \mathbb {R}\).
Left as an exercise.
(Inequality of Jensen) Let \(f\in \operatorname {Conv}\mathbb {R}^n\). Then, for all collections \(\{ x_1,\dots ,x_k\} \) of points in \(\text{dom }f\) and all \(\alpha =(\alpha _1,\dots ,\alpha _k)\) in the unit simplex of \(\mathbb {R}^k\), there holds (inequality of Jensen in summation form)
The \(k\) points \((x_i,f(x_i))\in \mathbb {R}^n\times \mathbb {R}\) are clearly in \(\text{epi }f\), a convex set. Their convex combination
is also in \(\text{epi }f\) (Proposition A.1.3.3). This is just the claimed inequality.
Let \(f\in \text{Conv }\mathbb {R}^n\). The relative interior of \(\operatorname {epi}f\) is the union over \(x\in \operatorname {ri}\, \operatorname {dom}f\) of the open half-lines with bottom endpoints at \(f(x)\):
Since \(\operatorname {dom}f\) is the image of \(\operatorname {epi}f\) under the linear mapping “projection onto \(\mathbb {R}^n\)”, Proposition A.2.1.12 tells us that
Now take \(x\) arbitrary in \(\operatorname {ri}\operatorname {dom}f\). The subset of \(\operatorname {ri}\operatorname {epi}f\) that is projected onto \(x\) is just \(((\{ x\} \times \mathbb {R})\cap \operatorname {ri}\operatorname {epi}f)\), which in turn is \(\operatorname {ri}((\{ x\} \times \mathbb {R})\cap \operatorname {epi}f)\) (use Proposition A.2.1.10). This latter set is clearly \(\{ x\} \times (f(x),+\infty )\).
In summary, we have proved that, for \(x\in \operatorname {ri}\operatorname {dom}f\), \((x,r)\in \operatorname {ri}\operatorname {epi}f\) if and only if \(r{\gt}f(x)\). Together with (1.1.5), this proves our claim.
2.1.2 Special convex functions: Affinity and Closedness
Any \(f\in \operatorname {Conv}\mathbb {R}^n\) is minorized by some affine function. More precisely: for any \(x_0\in \operatorname {ri}\operatorname {dom}f\), there is \(s\) in the subspace parallel to \(\operatorname {aff}\operatorname {dom}f\) such that
In other words, the affine function can be forced to coincide with \(f\) at \(x_0\).
We know that \(\operatorname {dom}f\) is the image of \(\operatorname {epi}f\) under the linear mapping "projection onto \(\mathbb {R}^n\)". Look again at the definition of an affine hull (§A.1.3) to realize that
Denote by \(V\) the linear subspace parallel to \(\operatorname {aff}\operatorname {dom}f\), so that \(\operatorname {aff}\operatorname {dom}f=\{ x_0\} +V\) with \(x_0\) arbitrary in \(\operatorname {dom}f\); then we have
We equip \(V\times \mathbb {R}\) and \(\mathbb {R}^n\times \mathbb {R}\) with the scalar product of product-spaces. With \(x_0\in \operatorname {ri}\operatorname {dom}f\), Proposition 1.1.9 tells us that \((x_0,f(x_0))\in \operatorname {rbd}\operatorname {epi}f\) and we can take a nontrivial hyperplane supporting \(\operatorname {epi}f\) at \((x_0,f(x_0))\): using Remark A.4.2.2 and (1), there are \(s=s_V\in V\) and \(\alpha \in \mathbb {R}\), not both zero, such that
for all \((x,r)\) with \(f(x)\le r\). Note: this implies \(\alpha \le 0\) (let \(r\to +\infty \)!).
Because of our choice of \(s\) (in \(V\)) and \(x_0\) (in \(\operatorname {ri}\operatorname {dom}f\)), we can take \(\delta {\gt}0\) so small that \(x_0+\delta s\in \operatorname {dom}f\), for which (2) gives
this shows \(\alpha \neq 0\) (otherwise, both \(s\) and \(\alpha \) would be zero). Without loss of generality, we can assume \(\alpha =-1\); then (2) gives our affine function.
For \(f:\mathbb {R}^n\to \mathbb {R}\cup \{ +\infty \} \), the following three properties are equivalent:
\(f\) is lower semi-continuous on \(\mathbb {R}^n\);
\(\operatorname {epi} f\) is a closed set in \(\mathbb {R}^n\times \mathbb {R}\);
the sublevel-sets \(S_r(f)\) are closed (possibly empty) for all \(r\in \mathbb {R}\).
[(i) \(\Rightarrow \) (ii)] Let \((y_k,r_k)_k\) be a sequence of \(\operatorname {epi} f\) converging to \((x,r)\) for \(k\to +\infty \). Since \(f(y_k)\le r_k\) for all \(k\), the l.s.c. relation (1.2.3) readily gives
i.e. \((x,r)\in \operatorname {epi} f\).
[(ii) \(\Rightarrow \) (iii)] Construct the sublevel-sets \(S_r(f)\) as in Remark 1.1.7: the closed sets \(\operatorname {epi} f\) and \(\mathbb {R}^n\times \{ r\} \) have a closed intersection.
[(iii) \(\Rightarrow \) (i)] Suppose \(f\) is not lower semi-continuous at some \(x\): there is a (sub)sequence \((y_k)\) converging to \(x\) such that \(f(y_k)\) converges to \(\rho {\lt}f(x)\le +\infty \). Pick \(r\in [\rho ,f(x))\): for \(k\) large enough, \(f(y_k)\le r{\lt} f(x)\); hence \(S_r(f)\) contains the tail of \((y_k)\) but not its limit \(x\). Consequently, this \(S_r(f)\) is not closed.
(Closed Functions) The function \(f:\mathbb {R}^n\to \mathbb {R}\cup \{ +\infty \} \) is said to be closed if it is lower semi-continuous everywhere, or if its epigraph is closed, or if its sublevel-sets are closed.
The closure (or lower semi-continuous hull) of a function \(f\) is the function \(\operatorname {cl}f:\mathbb {R}^n\to \mathbb {R}\cup \{ \pm \infty \} \) defined by:
or equivalently by
Let \(f\in \operatorname {Conv}\mathbb {R}^n\) and \(x'\in \operatorname {ri}\operatorname {dom}f\). There holds (in \(\mathbb {R}\cup \{ +\infty \} \))
Since \(x_t:=x+t(x'-x)\to x\) when \(t\downarrow 0\), we certainly have
We will prove the converse inequality by showing that
(Non-existence of such an \(r\) means that \(\operatorname {cl}f(x)=+\infty \), the proof is finished.)
Thus let \((x,r)\in \operatorname {epi}(\operatorname {cl}f)=\operatorname {cl}(\operatorname {epi}f)\). Pick \(r'{\gt}f(x')\), hence \((x',r')\in \operatorname {epi}f\) (Proposition 1.1.9). Applying Lemma A.2.1.6 to the convex set \(\operatorname {epi}f\), we see that
This just means
and our required inequality follows by letting \(t\downarrow 0\).
For \(f\in \text{Conv }\mathbb {R}^n\), there holds
We already know from Proposition A.1.2.6 that \(\text{epi }\operatorname {cl} f=\operatorname {cl}\text{epi }f\) is a convex set; also \(\operatorname {cl} f\le f\not\equiv +\infty \); finally, Proposition 1.2.1 guarantees in the relation of definition (1.2.4) that \(\operatorname {cl} f(x){\gt}-\infty \) for all \(x\): (1.2.7) does hold.
On the other hand, suppose \(x\in \text{ri }\text{dom }f\). Then the one-dimensional function \(\varphi (t)=f(x+td)\) is continuous at \(t=0\) (Theorem 0.6.2); it follows from Proposition 1.2.5 that \(\operatorname {cl} f\) coincides with \(f\) on \(\text{ri }\text{dom }f\); besides, \(\operatorname {cl} f(x)\) is obviously equal to \(f(x)=+\infty \) for all \(x\notin \text{cl }\text{dom }f\). Altogether, (1.2.8) is true.
Notation 1.2.7 (The Set \(\overline{\operatorname {Conv}}\mathbb {R}^n\)) The set of closed convex functions on \(\mathbb {R}^n\) is denoted by \(\overline{\operatorname {Conv}}\mathbb {R}^n\). □
The closure of \(f\in \text{Conv }\mathbb {R}^n\) is the supremum of all affine functions minorizing \(f\):
A closed half-space containing \(\text{epi }f\) is characterized by a nonzero vector \((s,\alpha )\in \mathbb {R}^n\times \mathbb {R}\) and a real number \(b\) such that
(we equip the graph-space \(\mathbb {R}^n\times \mathbb {R}\) with the scalar product of a product-space). Let us denote by \(\Sigma \subset \mathbb {R}^n\times \mathbb {R}\times \mathbb {R}\) the index-set of such triples \(\sigma =(s,\alpha ,b)\), with corresponding half-space
In other words, \(\operatorname {epi}(\overline{cl}\, f)=\overline{\operatorname {epi}f}=\bigcap _{\sigma \in \Sigma }H^-_{\sigma }\).
Because of the particular nature of an epigraph, (1.2.10) implies \(\alpha \le 0\) (let \(r\to +\infty \)) and, by positive homogeneity, the values \(\alpha =0\) and \(\alpha =-1\) suffice: \(\Sigma \) can be partitioned in
and
Indeed, \(\Sigma _1\) corresponds to affine functions minorizing \(f\) (Proposition 1.2.1 tells us that \(\Sigma _1\neq \varnothing \)) and \(\Sigma _0\) to closed half-spaces of \(\mathbb {R}^{n}\) containing \(\operatorname {dom}f\) (note that \(\Sigma _0=\varnothing \) if \(\operatorname {dom}f=\mathbb {R}^n\)).
We have to prove that, even when \(\Sigma _0\neq \varnothing \), intersecting the half-spaces \(H^-_{\sigma }\) over \(\Sigma \) or over \(\Sigma _1\) produces the same set, namely \(\operatorname {epi}f\). For this we take arbitrary \(\sigma _0=(s_0,0,b_0)\in \Sigma _0\) and \(\sigma _1=(s_1,-1,b_1)\in \Sigma _1\), we set
and we prove (see Fig. 1.2.1)
Fig. 1.2.1. Closing a convex epigraph
It results directly from the definition (1.2.11) that an \((x,r)\) in \(H^-_{\sigma _0}\cap H^-_{\sigma _1}\) satisfies
i.e. \((x,r)\in H^-\). Conversely, take \((x,r)\in H^-\). Set \(t=0\) in (1.2.12) to see that \((x,r)\in H^-_{\sigma _1}\). Also, divide by \(t{\gt}0\) and let \(t\to +\infty \) to see that \((x,r)\in H^-_{\sigma _0}\). The proof is complete.
2.1.3 First examples
Let \(C\) be a nonempty subset of \(\mathbb {R}^n\times \mathbb {R}\) satisfying ??, and let its lower-bound function \(\ell _C\) be defined by ??.
If \(C\) is convex, then \(\ell _C\in \operatorname {Conv}\mathbb {R}^n\);
If \(C\) is closed convex, then \(\ell _C\in \overline{\operatorname {Conv}}\mathbb {R}^n\).
We use the analytical definition (1.1.1). Take arbitrary \(\varepsilon {\gt}0\), \(\alpha \in ]0,1[\) and \((x_i,r_i)\in C\) such that \(r_i\le \ell _C(x_i)+\varepsilon \) for \(i=1,2\).
When \(C\) is convex, \((\alpha x_1+(1-\alpha )x_2,\alpha r_1+(1-\alpha )r_2)\in C\), hence
The convexity of \(\ell _C\) follows, since \(\varepsilon {\gt}0\) was arbitrary; (i) is proved.
Now take a sequence \((x_k,\rho _k)_k\subset \operatorname {epi}\ell _C\) converging to \((x,\rho )\); we have to prove \(\ell _C(x)\le \rho \) (cf. Proposition 1.2.2). By definition of \(\ell _C(x_k)\), we can select, for each positive integer \(k\), a real number \(r_k\) such that \((x_k,r_k)\in C\) and
We deduce first that \((r_k)\) is bounded from above. Also, when \(\ell _C\) is convex, Proposition 1.2.1 implies the existence of an affine function minorizing \(\ell _C\): \((r_k)\) is bounded from below.
Extracting a subsequence if necessary, we may assume \(r_k\to r\). When \(C\) is closed, \((x,r)\in C\), hence \(\ell _C(x)\le r\); but pass to the limit in 3 to see that \(r\le \rho \); the proof is complete.
2.2 Functional Operations Preserving Convexity
2.2.1 Operations preserving closedness
Let \(f_1,\dots ,f_m\) be in \(\text{Conv }\mathbb {R}^n\) [resp. in \(\overline{\text{Conv }}\mathbb {R}^n\)], let \(t_1,\dots ,t_m\) be positive numbers, and assume that there is a point where all the \(f_j\)’s are finite. Then the function \(f:=\sum _{j=1}^m t_j f_j\) is in \(\text{Conv }\mathbb {R}^n\) [resp. in \(\overline{\text{Conv }}\mathbb {R}^n\)].
The convexity of \(f\) is readily proved from the relation of definition (1.1.1). As for its closedness, start from
(valid for \(t_j{\gt}0\) and \(f_j\) closed); then note that the lim inf of a sum is not smaller than the sum of lim inf’s.
Let \(\{ f_j\} _{j\in J}\) be an arbitrary family of convex [resp. closed convex] functions. If there exists \(x_0\) such that \(\sup _{j} f_j(x_0) {\lt} +\infty \), then their pointwise supremum \(f := \sup \{ f_j : j\in J\} \) is in \(\mathrm{Conv}\, \mathbb {R}^n\) [resp. in \(\mathrm{Conv}_\mathrm {cl}\, \mathbb {R}^n\)].
The key property is that a supremum of functions corresponds to an intersection of epigraphs: \(\operatorname {epi} f = \bigcap _{j\in J}\operatorname {epi} f_j\), which conserves convexity and closedness. The only needed restriction is nonemptiness of this intersection.
Let \(f\in \operatorname {Conv}\mathbb {R}^n\) [resp. \(\overline{\operatorname {Conv}}\mathbb {R}^n\)] and let \(A\) be an affine mapping from \(\mathbb {R}^m\) to \(\mathbb {R}^n\) such that \(\operatorname {Im}A\cap \operatorname {dom}f\neq \varnothing \). Then the function
is in \(\operatorname {Conv}\mathbb {R}^m\) [resp. \(\overline{\operatorname {Conv}}\mathbb {R}^m\)].
Clearly \((f\circ A)(x){\gt}-\infty \) for all \(x\); besides, there exists by assumption \(y=A(x)\in \mathbb {R}^n\) such that \(f(y){\lt}+\infty \). To check convexity, it suffices to plug the relation
into the analytical definition (1.1.1) of convexity. As for closedness, it comes readily from the continuity of \(A\) when \(f\) is itself closed.
Let \(f\in \operatorname {Conv}\mathbb {R}^n\) [resp. \(\overline{\operatorname {Conv}}\mathbb {R}^n\)] and let \(g\in \operatorname {Conv}\mathbb {R}\) [resp. \(\overline{\operatorname {Conv}}\mathbb {R}\)] be increasing. Assume that there is \(x_0\in \mathbb {R}^n\) such that \(f(x_0)\in \operatorname {dom}g\), and set \(g(+\infty ):=+\infty \). Then the composite function \(g\circ f:\; x\mapsto g(f(x))\) is in \(\operatorname {Conv}\mathbb {R}^n\) [resp. in \(\overline{\operatorname {Conv}}\mathbb {R}^n\)].
It suffices to check the inequalities of definition: (1.1.1) for convexity, (1.2.3) for closedness.
2.2.2 Dilations and perspectives of a functions
If \(f\in \operatorname {Conv}\mathbb {R}^n\), its perspective \(\tilde f\) is in \(\operatorname {Conv}\mathbb {R}^{n+1}\).
Here also, it is better to look at \(\tilde f\) with “geometric glasses”:
and \(\operatorname {epi}\tilde f\) is therefore a convex cone.
Let \(f\in \operatorname {Conv}\mathbb {R}^n\) and let \(x'\in \operatorname {ri}\text{dom }f\). Then the closure \(\overline{f}\) of its perspective is given as follows:
Suppose first \(u{\lt}0\). For any \(x\), it is clear that \((u,x)\) is outside \(\text{cl }\text{dom }\tilde f\) and, in view of (1.2.8), \(\text{cl }\tilde f(u,x)=+\infty \).
Now let \(u\ge 0\). Using (2.2.1), the assumption on \(x'\) and the results of §A.2.1, we see that \((1,x')\in \text{ri }\text{dom }\tilde f\), so Proposition 1.2.5 allows us to write
If \(u=1\), this reads \(\text{cl }\tilde f(1,x)=\text{cl }f(x)=f(x)\) (because \(f\) is closed); if \(u=0\), we just obtain our claimed relation.
2.2.3 Infimal convolution
Let \(f_1\) and \(f_2\) be two functions from \(\mathbb {R}^n\) to \(\mathbb {R}\cup \{ +\infty \} \). Their infimal convolution is the function from \(\mathbb {R}^n\) to \(\mathbb {R}\cup \{ \pm \infty \} \) defined by
We will also call “infimal convolution” the operation expressed by 7. It is called exact at \(x=\bar x_1+\bar x_2\) when the infimum is attained at \((\bar x_1,\bar x_2)\), not necessarily unique. □
Let the functions \(f_1\) and \(f_2\) be in \(\operatorname {Conv}\mathbb {R}^n\). Suppose that they have a common affine minorant: for some \((s,b)\in \mathbb {R}^n\times \mathbb {R}\),
Then their infimal convolution is also in \(\operatorname {Conv}\mathbb {R}^n\).
For arbitrary \(x\in \mathbb {R}^n\) and \(x_1,x_2\) such that \(x_1+x_2=x\), we have by assumption
and this inequality extends to the infimal value \((f_1\mathbin {\square }f_2)(x)\).
On the other hand, it suffices to choose particular values \(x_j\in \text{dom }f_j\), \(j=1,2\), to obtain the point \(x_1+x_2\in \text{dom }(f_1\mathbin {\square }f_2)\). Finally, the convexity of \(f_1\mathbin {\square }f_2\) results from the convexity of a lower-bound function, as seen in §1.3(g).
2.2.4 Image of a function under a linear mapping
(Image Function) Let \(A:\mathbb {R}^m\to \mathbb {R}^n\) be a linear operator and let \(g:\mathbb {R}^m\to \mathbb {R}\cup \{ +\infty \} \). The image of \(g\) under \(A\) is the function \(Ag:\mathbb {R}^n\to \mathbb {R}\cup \{ \pm \infty \} \) defined by
(here as always, \(\inf \varnothing =+\infty \)).
Let \(g\) of Definition 2.4.1 be in \(\mathrm{Conv}\, \mathbb {R}^m\). Assume also that, for all \(x\in \mathbb {R}^n\), \(g\) is bounded below on the inverse image \(A^{-1}(x)=\{ y\in \mathbb {R}^m:Ay=x\} \). Then \(Ag\in \mathrm{Conv}\, \mathbb {R}^n\).
By assumption, \(Ag\) is nowhere \(-\infty \); also, \((Ag)(x){\lt}+\infty \) whenever \(x=Ay\), with \(y\in \text{dom }g\). Now consider the extended operator
The set \(A'(\text{epi }g)=:C\) is convex in \(\mathbb {R}^n\times \mathbb {R}\); let us compute its lower-bound function (1.3.5): for given \(x\in \mathbb {R}^n\),
and this proves the convexity of \(Ag=\ell _C\).
Let (2.4.1) have the following form: \(U=\mathbb {R}^p\); \(\varphi \in \text{Conv }\mathbb {R}^p\); \(X=\mathbb {R}^n\) is equipped with the canonical basis; the mapping \(c\) has its components \(c_j\in \text{Conv }\mathbb {R}^p\) for \(j=1,\dots ,n\). Suppose also that the optimal value is \({\gt}-\infty \) for all \(x\in \mathbb {R}^n\), and that
Then the value function
lies in \(\text{Conv }\mathbb {R}^n\).
Note first that we have assumed \(v_{\varphi ,c}(x){\gt}-\infty \) for all \(x\). Take \(u_0\) in the set (2.4.4) and set \(M:=\max _j c_j(u_0)\); then take \(x_0:=(M,\dots ,M)\in \mathbb {R}^n\), so that \(v_{\varphi ,c}(x_0)\le \varphi (u_0){\lt}+\infty \). Knowing that \(v_{\varphi ,c}\) is an image-function, we just have to prove the convexity of the set (2.4.3); but this in turn comes immediately from the convexity of each \(c_j\).
(Marginal Function) Let \(g\in \operatorname {Conv}(\mathbb {R}^n\times \mathbb {R}^m)\). Then
is the marginal function of \(g\).
With the above notation, suppose that \(g\) is bounded below on the set \(\{ x\} \times \mathbb {R}^m\), for all \(x\in \mathbb {R}^n\). Then the marginal function \(\gamma \) lies in \(\operatorname {Conv}\mathbb {R}^n\).
The marginal function \(\gamma \) is the image of \(g\) under the the linear operator \(A\) projecting each \((x,y)\in \mathbb {R}^n\times \mathbb {R}^m\) onto \(x\in \mathbb {R}^n\): \(A(x,y)=x\). □
2.2.5 Convex hull and closed convex hull of a function
Let \(g:\mathbb {R}^n\to \mathbb {R}\cup \{ +\infty \} \), not identically \(+\infty \), be minorized by some affine function: for some \((s,b)\in \mathbb {R}^n\times \mathbb {R}\),
Then, the following three functions \(f_1, f_2\) and \(f_3\) are convex and coincide on \(\mathbb {R}^n\):
We denote by \(\Gamma \) the family of convex functions minorizing \(g\). By assumption, \(\Gamma \neq \varnothing \); then the convexity of \(f_1\) results from §1.3(g).
[\(f_2\le f_1\)] Consider the epigraph of any \(h\in \Gamma \): its lower-bound function \(\ell _{\mathrm{epi}\, h}\) is \(h\) itself; besides, it contains \(\mathrm{epi}\, g\), and \(\operatorname {co}(\mathrm{epi}\, g)\) as well (see Proposition A.1.3.4). In a word, there holds \(h=\ell _{\mathrm{epi}\, h}\le \ell _{\operatorname {co}\, \mathrm{epi}\, g}=f_1\) and we conclude \(f_2\le f_1\) since \(h\) was arbitrary in \(\Gamma \).
[\(f_3\le f_2\)] We have to prove \(f_3\in \Gamma \), and the result will follow by definition of \(f_2\); clearly \(f_3\le g\) (take \(\alpha \in \Delta _1\)!), so it suffices to establish \(f_3\le \operatorname {Conv}\mathbb {R}^n\). First, with \((s,b)\) of (2.5.1) and all \(x,\ \{ x_j\} \) and \(\{ \alpha _j\} \) as described by (2.5.2),
hence \(f_3\) is minorized by the affine function \(\langle s,\cdot \rangle - b\). Now, take two points \((x,r)\) and \((x',r')\) in the strict epigraph of \(f_3\). By definition of \(f_3\), there are \(k,\ \{ \alpha _j\} ,\ \{ x_j\} \) as described in (2.5.2), and likewise \(k',\ \{ \alpha '_j\} ,\ \{ x'_j\} \), such that \(\sum _{j=1}^k \alpha _j g(x_j){\lt}r\) and likewise \(\sum _{j=1}^{k'} \alpha '_j g(x'_j){\lt}r'\).
For arbitrary \(t\in ]0,1[\), we obtain by convex combination
Observe that
i.e. we have in the lefthand side a convex decomposition of \(tx+(1-t)x'\) in \(k+k'\) elements; therefore, by definition of \(f_3\):
and we have proved that \(\mathrm{epi}_s f_3\) is a convex set: \(f_3\) is convex.
Let \(x\in \mathbb {R}^n\) and take an arbitrary convex decomposition \(x=\sum _{j=1}^k \alpha _j x_j\), with \(\alpha _j\) and \(x_j\) as described in (2.5.2). Since \((x_j,g(x_j))\in \operatorname {epi} g\) for \(j=1,\ldots ,k\),
and this implies \(f_1(x)\le \sum _{j=1}^k \alpha _j g(x_j)\) by definition of \(f_1\). Because the decomposition of \(x\) was arbitrary within (2.5.2), this implies \(f_1(x)\le f_3(x)\).
Let \(g\) satisfy the hypotheses of Proposition 2.5.1. Then the three functions below
are closed, convex, and coincide on \(\mathbb {R}^n\) with the closure of the function constructed in Proposition 2.5.1.
Let \(g:\mathbb {R}^n\to \mathbb {R}\cup \{ +\infty \} \), not identically \(+\infty \), be minorized by an affine function. The common function \(f_1=f_2=f_3\) of Proposition 2.5.1 is called the convex hull of \(g\), denoted by \(\operatorname {co}g\). The closed convex hull of \(g\) is any of the functions described by Proposition 2.5.2; it is denoted by \(\overline{\operatorname {co}}\, g\) or \(\operatorname {cl}\, \operatorname {co}g\).
Let \(g_1,\dots ,g_m\) be in \(\operatorname {Conv}\mathbb {R}^n\), all minorized by the same affine function. Then the convex hull of their infimum is the function
Apply Example A.1.3.5 to the convex sets \(C_j=\operatorname {epi} g_j\).
2.3 Local and Global Behavior of a Convex Function
2.3.1 Continuity properties
Let \(f\in \operatorname {Conv}\mathbb {R}^n\) and suppose there are \(x_0,\ \delta ,\ m\) and \(M\) such that
Then \(f\) is Lipschitzian on \(B(x_0,\delta )\); more precisely: for all \(y\) and \(y'\) in \(B(x_0,\delta )\),
Look at Fig. 3.1.1: with two different \(y\) and \(y'\) in \(B(x_0,\delta )\), take
by construction, \(y'\) lies on the segment \([y,y'']\), namely
Applying the convexity of \(f\) and using the postulated bounds, we obtain
Then, it suffices to exchange \(y\) and \(y'\) to prove (3.1.1).
With \(f\in \operatorname {Conv}\mathbb {R}^n\), let \(S\) be a convex compact subset of \(\operatorname {ri}\text{dom }f\). Then there exists \(L=L(S)\ge 0\) such that
First of all, our statement ignores \(x\)-values outside the affine hull of the convex set \(\text{dom }f\). Instead of \(\mathbb {R}^n\), it can be formulated in \(\mathbb {R}^d\), where \(d\) is the dimension of \(\text{dom }f\); alternatively, we may assume \(\text{ri }\text{dom }f=\int \text{dom }f\), which will simplify the writing.
Make this assumption and let \(x_0\in S\). We will prove that there are \(\delta =\delta (x_0){\gt}0\) and \(L=L(x_0,\delta )\) such that the ball \(B(x_0,\delta )\) is included in \(\int \text{dom }f\) and
If this holds for all \(x_0\in S\), the corresponding balls \(B(x_0,\delta )\) will provide a covering of the compact set \(S\), from which we will extract a finite covering \((x_1,\delta _1,L_1),\ldots ,(x_k,\delta _k,L_k)\). With these balls, we will divide an arbitrary segment \([x,x']\) of the convex set \(S\) into finitely many subsegments, of endpoints \(y_0:=x,\ldots ,y_i,\ldots ,y_\ell :=x'\). Ordering properly the \(y_i\)’s, we will have \(\| x-x'\| =\sum _{i=1}^\ell \| y_i-y_{i-1}\| \); furthermore, \(f\) will be Lipschitzian on each \([y_{i-1},y_i]\) with the common constant \(L:=\max \{ L_1,\ldots ,L_k\} \). The required Lipschitz property (3.1.2) will follow.
[Main Step] To establish (3.1.3), we use Lemma 3.1.1, which requires boundedness of \(f\) in the neighborhood of \(x_0\). For this, we construct as in the proof of Theorem A.2.1.3 (see Fig. A.2.1.1) a simplex \(\Delta =\operatorname {co}\{ v_0,\dots ,v_n\} \cap \operatorname {dom}f\) having \(x_0\) in its interior: we can take \(\delta {\gt}0\) such that \(B(x_0,2\delta )\subset \Delta \).
Then any \(y\in B(x_0,2\delta )\) can be written: \(y=\sum _{i=0}^n\alpha _i v_i\) with \(\alpha \in \Delta _{n+1}\), so that the convexity of \(f\) gives
On the other hand, Proposition 1.2.1 tells us that \(f\) is bounded from below, say by \(m\), on this very same \(B(x_0,2\delta )\). Our claim is proved: we have singled out \(\delta {\gt}0\) such that \(m\le f(y)\le M\) for all \(y\in B(x_0,2\delta )\).
Let the convex functions \(f_k:\mathbb {R}^n\to \mathbb {R}\) converge pointwise for \(k\to +\infty \) to \(f:\mathbb {R}^n\to \mathbb {R}\). Then \(f\) is convex and, for each compact set \(S\), the convergence of \(f_k\) to \(f\) is uniform on \(S\).
Convexity of \(f\) is trivial: pass to the limit in the definition (1.1.1) itself. For uniformity, we want to use Lemma 3.1.1, so we need to bound \(f_k\) on \(S\) independently of \(k\); thus, let \(r{\gt}0\) be such that \(S\subset B(0,r)\).
[Step 1] First the function \(g:=\sup _k f_k\) is convex, and \(g(x){\lt}+\infty \) for all \(x\) because the convergent sequence \((f_k(x))_k\) is certainly bounded. Hence, \(g\) is continuous and therefore bounded, say by \(M\), on the compact set \(B(0,2r)\):
Second, the convergent sequence \((f_k(0))_k\) is bounded from below:
Then, for \(x\in B(0,2r)\) and all \(k\), use convexity on \([-x,x]\subset B(0,2r)\):
i.e. the \(f_k\)’s are bounded from below, independently of \(k\). Thus, we are within the conditions of Lemma 3.1.1: there is some \(L\) (independent of \(k\)) such that
Naturally, the same Lipschitz property is transmitted to the limiting function \(f\).
[Step 2] Now fix \(\varepsilon {\gt}0\). Cover \(S\) by the balls \(B(x,\varepsilon )\) for \(x\) describing \(S\), and extract a finite covering \(S\subset B(x_1,\varepsilon )\cup \cdots \cup B(x_m,\varepsilon )\). With \(x\) arbitrary in \(S\), take an \(x_i\) such that \(x\in B(x_i,\varepsilon )\). There is \(k_{i,\varepsilon }\) such that, for all \(k\ge k_{i,\varepsilon }\),
where we have also used (3.1.5), knowing that \(x\) and \(x_i\) are in \(S\subset B(0,r)\). The above inequality is then valid uniformly in \(x\), providing that
2.3.2 Behavior at infinity
For \(f\in \overline{\operatorname {Conv}}\mathbb {R}^n\), the asymptotic cone of \(\operatorname {epi} f\) is the epigraph of the function \(f'_{\infty }\in \overline{\operatorname {Conv}}\mathbb {R}^n\) defined by
where \(x_0\) is arbitrary in \(\operatorname {dom} f\).
Since \((x_0,f(x_0))\in \operatorname {epi} f\), (3.2.1) tells us that \((d,\rho )\in (\operatorname {epi} f)_{\infty }\) if and only if \(f(x_0+td)\le f(x_0)+t\rho \) for all \(t{\gt}0\), which means
In other words, \((\operatorname {epi} f)_\infty \) is the epigraph of the function whose value at \(d\) is the left hand side of (3.2.3); and this is true no matter how \(x_0\) has been chosen in \(\operatorname {dom} f\). The rest follows from the fact that the difference quotient in (3.2.3) is closed convex in \(d\), and increasing in \(t\) (the function \(t\mapsto f(x_0+td)\) is convex and enjoys the property of increasing slopes, namely Proposition 0.6.1). \(\square \)
(Asymptotic function) The function \(f'_{\infty }\) of Proposition 3.2.1 is called the asymptotic function, or recession function, of \(f\).
Let \(f\in \operatorname {Conv}\mathbb {R}^n\). All the nonempty sublevel-sets of \(f\) have the same asymptotic cone, which is the sublevel-set of \(f^\infty \) at the level \(0\):
In particular, the following statements are equivalent:
There is \(r\) for which \(S_r(f)\) is nonempty and compact;
all the sublevel-sets of \(f\) are compact;
\(f^\infty _\circ (d){\gt}0\) for all nonzero \(d\in \mathbb {R}^n\).
By definition (A.2.2.1), a direction \(d\) is in the asymptotic cone of the nonempty sublevel-set \(S_r(f)\) if and only if
which can also be written — see (1.1.4):
and this in turn just means that \((d,0)\in (\operatorname {epi} f)_\infty =\operatorname {epi} f^\infty _\circ \). We have proved the first part of the theorem.
A particular case is when the sublevel-set \(S_0(f^\infty _\circ )\) is reduced to the singleton \(\{ 0\} \), which exactly means (iii). This is therefore equivalent to \([S_r(f)]_\infty =\{ 0\} \) for all \(r\in \mathbb {R}\) with \(S_r(f)\neq \emptyset \), which means that \(S_r(f)\) is compact (Proposition A.2.2.3). The equivalence between (i), (ii) and (iii) is proved.
(Coercivity) The functions \(f\in \operatorname {Conv}\mathbb {R}^n\) satisfying (i), (ii) or (iii) are called \(0\)-coercive. Equivalently, the \(0\)-coercive functions are those that “increase at infinity”:
and closed convex \(0\)-coercive functions achieve their minimum over \(\mathbb {R}^n\).
An important particular case is when \(f'_\infty (d)=+\infty \) for all \(d\neq 0\), i.e. when \(f'_\infty = \iota _{\{ 0\} }\). It can be seen that this means precisely
(to establish this equivalence, extract a cluster point of \((x_k/\| x_k\| )\), and use the lower semi-continuity of \(f'_\infty \)). In words: at infinity, \(f\) increases to infinity faster than any affine function; such functions are called \(1\)-coercive, or sometimes just coercive.
A function \(f\in \operatorname {Conv}\mathbb {R}^n\) is Lipschitzian on the whole of \(\mathbb {R}^n\) if and only if \(f'_\infty \) is finite on the whole of \(\mathbb {R}^n\). The best Lipschitz constant for \(f\) is then
When the (convex) function \(f'_\infty \) is finite-valued, it is continuous (§3.1) and therefore bounded on the compact unit sphere:
which implies by positive homogeneity
Now use the definition (3.2.2) of \(f'_\infty \):
thus, \(\operatorname {dom}f\) is the whole space (\(f(x+d){\lt}+\infty \) for all \(d\)) and we do obtain that \(L\) is a global Lipschitz constant for \(f\).
Conversely, let \(f\) have a global Lipschitz constant \(L\). Pick \(x_0\in \operatorname {dom}f\) and plug the inequality
into the definition (3.2.2) of \(f'_\infty \) to obtain \(f'_\infty (d)\le L\| d\| \) for all \(d\in \mathbb {R}^n\).
It follows that \(f'_\infty \) is finite everywhere, and the value (3.2.4) does not exceed \(L\).
Let \(f_1,\dots ,f_m\) be \(m\) functions of \(\text{Conv }\mathbb {R}^n\), and \(t_1,\dots ,t_m\) be positive numbers. Assume that there is \(x_0\) at which each \(f_j\) is finite. Then,
\[ \text{for } f:=\sum _{j=1}^m t_j f_j,\qquad \text{we have } f'_\infty =\sum _{j=1}^m t_j(f_j)'_\infty . \]Let \(\{ f_j\} _{j\in J}\) be a family of functions in \(\text{Conv }\mathbb {R}^n\). Assume that there is \(x_0\) at which \(\sup _{j\in J} f_j(x_0){\lt}+\infty \). Then,
\[ \text{for } f:=\sup _{j\in J} f_j,\qquad \text{we have } f'_\infty =\sup _{j\in J}(f_j)'_\infty . \]Let \(A:\mathbb {R}^n\to \mathbb {R}^m\) be affine with linear part \(A_0\), and let \(f\in \text{Conv }\mathbb {R}^m\). Assume that \(A(\mathbb {R}^n)\cap \text{dom }f\neq \emptyset \). Then \((f\circ A)'_\infty =f'_\infty \circ A_0\).
2.4 First- and Second-order Differentiation
2.4.1 Differentiable convex functions
Let \(f\) be a function differentiable on an open set \(\Omega \subset \mathbb {R}^n\), and let \(C\) be a convex subset of \(\Omega \). Then
\(f\) is convex on \(C\) if and only if
\[ f(x)\ge f(x_0)+\langle \nabla f(x_0),\, x-x_0\rangle \quad \text{for all }(x_0,x)\in C\times C; \tag {4.1.1} \]\(f\) is strictly convex on \(C\) if and only if strict inequality holds in (4.1.1) whenever \(x\neq x_0\);
\(f\) is strongly convex with modulus \(c\) on \(C\) if and only if, for all \((x_0,x)\in C\times C\),
\[ f(x)\ge f(x_0)+\langle \nabla f(x_0),x-x_0\rangle +\tfrac {1}{2}c\| x-x_0\| ^2. \tag {4.1.2} \]
Let \(f\) be convex on \(C\): for arbitrary \((x_0,x)\in C\times C\) and \(\alpha \in ]0,1[\), we have from the definition (1.1.1) of convexity
Divide by \(\alpha \) and let \(\alpha \downarrow 0\): observing that \(\alpha x+(1-\alpha )x_0=x_0+\alpha (x-x_0)\), the lefthand side tends to \(\langle \nabla f(x_0),x-x_0\rangle \) and (4.1.1) is established.
Conversely, take \(x_1\) and \(x_2\) in \(C\), \(\alpha \in ]0,1[\) and set \(x_0:=\alpha x_1+(1-\alpha )x_2\in C\). By assumption,
and we obtain by convex combination
which, after simplification, is just the relation of definition (1.1.1).
[(ii)] If \(f\) is strictly convex, we have for \(x_0\ne x\) in \(C\) and \(\alpha \in ]0,1[\),
but \(f\) is in particular convex and we can use (i):
so the required strict inequality follows.
For the converse, proceed as for (i), starting from strict inequalities in (4.1.3).
[(iii)] Using Proposition 1.1.2, just apply (i) to the function \(f-\tfrac {1}{2}c\| \cdot \| ^2\), which is of course differentiable.
Let \(C\subset \mathbb {R}^n\) be convex. The mapping \(F: C\to \mathbb {R}^n\) is said to be monotone [resp. strictly monotone, resp. strongly monotone with modulus \(c{\gt}0\)] on \(C\) when, for all \(x\) and \(x'\) in \(C\),
[resp. \(\langle F(x)-F(x'),\, x-x'\rangle {\gt} 0\) whenever \(x\neq x'\), resp. \(\langle F(x)-F(x'),\, x-x'\rangle \ge c\| x-x'\| ^2\)].
Let \(f\) be a function differentiable on an open set \(\Omega \subset \mathbb {R}^n\), and let \(C\) be a convex subset of \(\Omega \). Then, \(f\) is convex [resp. strictly convex, resp. strongly convex with modulus \(c\)] on \(C\) if and only if its gradient \(\nabla f\) is monotone [resp. strictly monotone, resp. strongly monotone with modulus \(c\)] on \(C\).
We combine the “convex \(\Leftrightarrow \) monotone” and “strongly convex \(\Leftrightarrow \) strongly monotone” cases by accepting the value \(c=0\) in the relevant relations such as (4.1.2).
Thus, let \(f\) be [strongly] convex on \(C\): in view of Theorem 4.1.1, we can write for arbitrary \(x_0\) and \(x\) in \(C\):
and mere addition shows that \(\nabla f\) is [strongly] monotone.
Conversely, let \((x_0,x_1)\) be a pair of elements in \(C\). Consider the univariate function \(t\mapsto \varphi (t):=f(x_t)\), where \(x_t:=x_0+t(x_1-x_0)\); for \(t\) in an open interval containing \([0,1]\), \(x_t\in \Omega \) and \(\varphi \) is well-defined and differentiable; its derivative at \(t\) is \(\varphi '(t)=\langle \nabla f(x_t),x_1-x_0\rangle \). Thus, we have for all \(0\le t'{\lt}t\le 1\)
and the monotonicity relation for \(\nabla f\) shows that \(\varphi '\) is increasing, \(\varphi \) is therefore convex (Corollary 0.6.5).
For strong convexity, set \(t'=0\) in (4.1.4) and use the strong monotonicity relation for \(\nabla f\):
Because the differentiable convex function \(\varphi \) is the integral of its derivative, we can write
which, by definition of \(\varphi \), is just (4.1.2) (the coefficient \(1/2\) is \(\int _0^1 t\, dt!\)).
The same technique proves the “strictly monotone \(\Leftrightarrow \) strictly convex” case; then, (4.1.5) becomes a strict inequality — with \(c=0\) — and remains so after integration.
2.4.2 Nondifferentiable convex functions
For \(f\in \operatorname {Conv}\mathbb {R}^n\) and \(x\in \operatorname {int}\text{dom }f\), the three statements below are equivalent:
The function
\[ \mathbb {R}^n\ni d\mapsto \lim _{t\downarrow 0}\frac{f(x+td)-f(x)}{t} \]is linear in \(d\);
for some basis of \(\mathbb {R}^n\) in which \(x=(\xi ^1,\dots ,\xi ^n)\), the partial derivatives \(\dfrac {\partial f}{\partial \xi ^i}(x)\) exist at \(x\), for \(i=1,\dots ,n\);
\(f\) is differentiable at \(x\).
First of all, remember from Theorem 0.6.3 that the one-dimensional function \(t\mapsto f(x+td)\) has half-derivatives at \(0\): the limits considered in (i) exist for all \(d\). We will denote by \(\{ b_1,\dots ,b_n\} \) the basis postulated in (ii), so that \(x=\sum _{i=1}^n \xi ^i b_i\).
Denote by \(d\mapsto \ell (d)\) the function defined in (i); taking \(d=\pm b_i\), realize that, when (i) holds,
This means that the two half-derivatives at \(t=0\) of the function \(t\mapsto f(x+t b_i)\) coincide: the partial derivative of \(f\) at \(x\) along \(b_i\) exists, (ii) holds. That (iii) implies (i) is clear: when \(f\) is differentiable at \(x\),
We do not really complete the proof here, because everything follows in a straightforward way from subsequent chapters. More precisely, [(ii) \(\Rightarrow \) (i)] is Proposition C.1.1.6, which states that the function \(\ell \) is linear on the space generated by the \(b_i\)’s, whenever it is linear along each \(b_i\). Finally [(i) \(\Rightarrow \) (iii)] results from Lemma D.2.1.1 and the proof goes as follows. One of the possible definitions of (iii) is:
Because \(f\) is locally Lipschitzian, the above limit exists whenever it exists for fixed \(d'=d\)—i.e. the expression in (i).
Let \(f\in \operatorname {Conv}\mathbb {R}^n\). The subset of \(\operatorname {int}\operatorname {dom}f\) where \(f\) fails to be differentiable is of zero (Lebesgue) measure.
2.4.3 Second-order differentiation
Let \(f\) be twice differentiable on an open convex set \(\Omega \subset \mathbb {R}^n\). Then
\(f\) is convex on \(\Omega \) if and only if \(\nabla ^2 f(x_0)\) is positive semi-definite for all \(x_0\in \Omega \);
if \(\nabla ^2 f(x_0)\) is positive definite for all \(x_0\in \Omega \), then \(f\) is strictly convex on \(\Omega \);
\(f\) is strongly convex with modulus \(c\) on \(\Omega \) if and only if the smallest eigenvalue of \(\nabla ^2 f(\cdot )\) is minorized by \(c\) on \(\Omega \): for all \(x_0\in \Omega \) and all \(d\in \mathbb {R}^n\),
\[ \langle \nabla ^2 f(x_0)d,d\rangle \ge c\| d\| ^2. \]
For given \(x_0\in \Omega \), \(d\in \mathbb {R}^n\) and \(t\in \mathbb {R}\) such that \(x_0+td\in \Omega \), we will set
so that \(\varphi ''(t)=\langle \nabla ^2 f(x_t)d,d\rangle \).
[(i)] Assume \(f\) is convex on \(\Omega \); let \((x_0,d)\) be arbitrary in \(\Omega \times \mathbb {R}^n\), with \(d\neq 0\): \(\varphi \) is then convex on the open interval \(I:=\{ t\in \mathbb {R}:x_0+td\in \Omega \} \). It follows
and \(\nabla ^2 f(x_0)\) is positive semi-definite.
Conversely, take an arbitrary \([x_0,x_1]\subset \Omega \), set \(d:=x_1-x_0\) and assume \(\nabla ^2 f(x_t)\) positive semi-definite, i.e. \(\varphi ''(t)\ge 0\), for \(t\in [0,1]\). Then Theorem 0.6.6 tells us that \(\varphi \) is convex on \([0,1]\), i.e. \(f\) is convex on \([x_0,x_1]\). The result follows since \(x_0\) and \(x_1\) were arbitrary in \(\Omega \).
[(ii)] To establish the strict convexity of \(f\) on \(\Omega \), we prove that \(\nabla f\) is strictly monotone on \(\Omega \): Theorem 4.1.4 will apply. As above, take an arbitrary \([x_0,x_1]\subset \Omega \), \(x_1\neq x_0\), \(d:=x_1-x_0\), and apply the mean-value theorem to the function \(\varphi '\), differentiable on \([0,1]\), for some \(\tau \in ]0,1[\),
and the result follows since
[(iii)] Using Proposition 1.1.2, apply (i) to the function \(f-{\tfrac 12}c\| \, \cdot \, \| ^2\), whose Hessian operator is \(\nabla ^2 f-cI_n\) and has the eigenvalues \(\lambda -c\), with \(\lambda \) describing the eigenvalues of \(\nabla ^2 f\).