5 Conjugacy in Convex Analysis
5.1 The Convex Conjugate of a Function
5.1.1 Definition and first examples
The conjugate of a function \(f\) satisfying (1.1.1) is the function \(f^{*}\) defined by
For simplicity, we may also let \(x\) run over the whole space instead of \(\operatorname {dom}f\).
The mapping \(f\mapsto f^{*}\) will often be called the conjugacy operation, or the Legendre–Fenchel transform.
For \(f\) satisfying (1.1.1), the conjugate \(f^*\) is a closed convex function: \(f^*\in \text{Conv }\mathbb {R}^n\).
See Example B.2.1.3.
5.1.2 Interpretations
There holds for all \(x\in \mathbb {R}^n\)
It follows that the support function of \(\mathrm{epi}\, f\) has the expression
In (1.2.1), the right-most term can be written
and the first equality is established. As for (1.2.2), the case \(u{\lt}0\) is trivial; when \(u{\gt}0\), use the positive homogeneity of support functions to get
Finally, for \(u=0\), we have by definition
and we recognize \(\sigma _{\operatorname {dom} f}(s)\).
For \(f\in \overline{\operatorname {Conv}}\mathbb {R}^n\),
Use direct calculations; or see Proposition B.2.2.2 and the calculations in Example B.3.2.3.
5.1.3 First properties
The function \(f\) is assumed to satisfy (1.1.1). The conjugate of the function \(g(x):=f(x)+r\) is \(g^*(s)=f^*(s)-r\).
The function \(f\) is assumed to satisfy (1.1.1). With \(t{\gt}0\), the conjugate of the function \(g(x):=tf(x)\) is \(g^*(s)=t f^*(s/t)\).
The function \(f\) is assumed to satisfy (1.1.1). With \(t\neq 0\), the conjugate of the function \(g(x):=f(tx)\) is \(g^*(s)=f^*(s/t)\).
The function \(f\) is are assumed to satisfy (1.1.1). More generally: if \(A\) is an invertible linear operator, \((f\circ A)^*=f^*\circ (A^{-1})^*\).
The function \(f\) is assumed to satisfy (1.1.1). The conjugate of the function \(g(x):=f(x-x_{0})\) is \(g^{*}(s)=f^{*}(s)+\langle s,x_{0}\rangle \).
The function \(f\) is assumed to satisfy (1.1.1). The conjugate of the function \(g(x):=f(x)+\langle s_{0},x\rangle \) is \(g^{*}(s)=f^{*}(s-s_{0})\).
The functions \(f_j\) appearing below are assumed to satisfy (1.1.1). If \(f_{1}\le f_{2}\), then \(f_{1}^{*}\ge f_{2}^{*}\).
The functions \(f_j\) appearing below are assumed to satisfy (1.1.1). “Convexity” of the conjugation: if \(\mathrm{dom}\, f_{1}\cap \mathrm{dom}\, f_{2}\neq \varnothing \) and \(\alpha \in ]0,1[\),
Let \(f\) satisfy (1.1.1), let \(H\) be a subspace of \(\mathbb {R}^n\), and call \(p_H\) the operator of orthogonal projection onto \(H\). Suppose that there is a point in \(H\) where \(f\) is finite. Then \(f+p_{iH}\) satisfies (1.1.1) and its conjugate is
When \(y\) describes \(\mathbb {R}^n\), \(p_H y\) describes \(H\) so we can write, knowing that \(p_H\) is symmetric:
For \(f\) satisfying (1.1.1), let a subspace \(V\) contain the subspace parallel to \(\operatorname {aff\, dom}f\) and set \(U:=V^{\perp }\). For any \(z\in \operatorname {aff\, dom}f\) and any \(s\in \mathbb {R}^{n}\) decomposed as \(s=s_{U}+s_{V}\), there holds
In (1.1.2), the variable \(x\) can range through \(z+V\supset \operatorname {aff\, dom}f\):
For \(f\) satisfying (1.1.1), the function \(f^{**}\) of (1.3.2) is the pointwise supremum of all the affine functions on \(\mathbb {R}^n\) majorized by \(f\). In other words
Call \(\Sigma \subset \mathbb {R}^n\times \mathbb {R}\) the set of pairs \((s,r)\) defining affine functions \(\langle s,\cdot \rangle - r\) majorized by \(f\):
Then we obtain, for \(x\in \mathbb {R}^n\),
Geometrically, the epigraphs of the affine functions associated with \((s,r)\in \Sigma \) are the (non-vertical) closed half-spaces containing \(\operatorname {epi} f\). From §B.2.5, the epigraph of their supremum is the closed convex hull of \(\operatorname {epi} f\), and this proves (1.3.3).
If \(g\) is a function satisfying \(\overline{\operatorname {co}}\, f \le g \le f\), then \(g^* = f^*\). The function \(f\) is equal to its biconjugate \(f^{**}\) if and only if \(f\in \overline{\operatorname {Conv}}\mathbb {R}^n\).
Immediate.
A function \(f\) satisfying (1.1.1) is said to be coercive [resp. 1-coercive] when
If \(f\) satisfying (1.1.1) is 1-coercive, then \(f^*(s) {\lt} +\infty \) for all \(s \in \mathbb {R}^n\).
Let \(s\) be given. The 1-coercivity of \(f\) implies the existence of a number \(R\) such that \(f(x) \ge \| s\| \, \| x\| \) (hence \(\langle s,x\rangle - f(x) \le 0\)) whenever \(\| x\| \ge R\). As a result, we have in (1.1.2)
On the other hand, (1.1.1) implies an upper bound
Altogether, \(f^*(s) \le \max \{ 0,M\} \). □
For \(f\) satisfying (1.1.1), the following holds:
If \(x_0\in \operatorname {int}\text{dom }f\) then \(f^*-\langle x_0,\cdot \rangle \) is \(0\)-coercive;
in particular, if \(f\) is finite over \(\mathbb R^n\), then \(f^*\) is \(1\)-coercive.
We know from (1.2.3) that \(\sigma _{\text{dom }f}=(f^*)^{\vee }_{\infty }\) so, using Theorem C.2.2.3(iii), \(x_0\in \operatorname {int}\text{dom }f\subset \int (\text{codom }f)\) implies \((f^*)^{\vee }_{\infty }(s)-\langle x_0,s\rangle {\gt}0\) for all \(s\neq 0\). By virtue of Proposition B.3.2.4, this means exactly that \(f^*-\langle x_0,\cdot \rangle \) has compact sublevel-sets; (i) is proved.
Then, as demonstrated in Definition B.3.2.5, \(0\)-coercivity of \(f^*-\langle x_0,\cdot \rangle \) for all \(x_0\) means \(1\)-coercivity of \(f^*\).
5.1.4 Subdifferentials and extended-valued functions
For \(f\) satisfying (1.1.1) and \(\partial f\) defined by (1.4.1), \(s\in \partial f(x)\) if and only if
To say that \(s\) lies in the set (1.4.1) is to say that
i.e. \(f^*(s)\le \langle s,x\rangle -f(x)\); but this is indeed an equality, in view of Fenchel’s inequality (1.1.3).
Let \(f\in \operatorname {Conv}\mathbb {R}^n\). Then \(\partial f(x)\neq \varnothing \) whenever \(x\in \operatorname {ri}\text{dom }f\).
This is Proposition B.1.2.1.
For \(f\) satisfying (1.1.1), the following properties hold:
Let \(s\) be a subgradient of \(f\) at \(x\). From the definition (1.4.1) itself, the function \(y\mapsto \ell _s(y):=f(x)+\langle s,y-x\rangle \) is affine and minorizes \(f\), hence \(\ell _s\le \overline{\operatorname {co}}\, f\le f\); because \(\ell _s(x)=f(x)\), this implies (1.4.3).
Now, \(s\in \partial f(x)\) if and only if (1.4.2) holds. From our assumption in (1.4.4), \(f^*=g^*=(\overline{\operatorname {co}}\, f)^*\) (Corollary 1.3.6) and \(g(x)=f(x)\). Therefore
which expresses exactly that \(s\in \partial g(x)\); (1.4.4) is proved.
Finally, we know that \(f^{**}=\overline{\operatorname {co}}\, f\le f\); so, when \(s\) satisfies (1.4.2), we have
which means \(x\in \partial f^*(s)\): we have just proved (1.4.5).
If \(f\in \operatorname {Conv}\mathbb {R}^n\), the following equivalences hold:
This is a rewriting of Theorem 1.4.1, taking into account (1.4.5) and the symmetric role played by \(f\) and \(f^*\) when \(f\in \operatorname {Conv}\mathbb {R}^n\).
5.2 Conjugacy Rules on the Conjugacy Operation
5.2.1 Image of a function under a linear mapping
With the above notation, assume that \(\operatorname {Im}A^* \cap \text{dom }g^* \neq \varnothing \). Then \(Ag\) satisfies (1.1.1) and its conjugate is
First, it is clear that \(Ag \not\equiv +\infty \) (take \(x=Ay\), with \(y\in \text{dom }g\)). On the other hand, our assumption implies the existence of some \(p_0 = A^*s_0\) such that \(g^*(p_0){\lt}+\infty \); with Fenchel’s inequality (1.1.3), we have for all \(y\in \mathbb R^m\):
For each \(x\in \mathbb R^n\), take the infimum over those \(y\) satisfying \(Ay=x\): the affine function \(\langle s_0,\cdot \rangle - g^*(p_0)\) minorizes \(Ag\). Altogether, \(Ag\) satisfies (1.1.1).
Then we have for \(s\in \mathbb R^n\)
With \(g:\mathbb {R}^n\times \mathbb {R}^p=\mathbb {R}^m\to \mathbb {R}\cup \{ +\infty \} \) not identically \(+\infty \), let \(g^*\) be associated with a scalar product preserving the structure of \(\mathbb {R}^m\) as a product space: \(\langle \cdot ,\cdot \rangle _m=\langle \cdot ,\cdot \rangle _n+\langle \cdot ,\cdot \rangle _p\). If there exists \(s_0\in \mathbb {R}^n\) such that \((s_0,0)\in \operatorname {dom} g^*\), then the conjugate of \(f\) defined by (2.1.2) is
It suffices to observe that, \(A\) being the projection defined above, there holds for all \(y_1=(x_1,z_1)\in \mathbb {R}^{m}\) and \(x_2\in \mathbb {R}^n\),
which defines the adjoint \(A^*x=(x,0)\) for all \(x\in \mathbb {R}^n\). Then apply Theorem 2.1.1.
Let \(f_1\) and \(f_2\) be two functions from \(\mathbb {R}^n\) to \(\mathbb {R}\cup \{ +\infty \} \), not identically \(+\infty \), and satisfying \(\operatorname {dom} f_1^*\cap \operatorname {dom} f_2^*\neq \varnothing \). Then their inf-convolution satisfies (1.1.1), and \((f_1\mathbin {\square }f_2)^*=f_1^*+f_2^*\).
Equip \(\mathbb {R}^n\times \mathbb {R}^n\) with the scalar product \(\langle \cdot ,\cdot \rangle +\langle \cdot ,\cdot \rangle \). Using the above notation for \(g\) and \(A\), we have \(g^*(s_1,s_2)=f_1^*(s_1)+f_2^*(s_2)\) (Proposition 1.3.1(ix)) and \(A^*(s)=(s,s)\). Then apply the definitions.
5.2.2 Pre-composition with an affine mapping
Take \(g\in \text{Conv }\mathbb {R}^m\), \(A_0\) linear from \(\mathbb {R}^n\) to \(\mathbb {R}^m\) and consider the affine operator \(A(x):=A_0x+y_0\in \mathbb {R}^m\). Suppose that \(A(\mathbb {R}^n)\cap \text{dom }g\neq \emptyset \). Then \(g\circ A\in \text{Conv }\mathbb {R}^n\) and its conjugate is the closure of the convex function
We start with the linear case (\(y_0=0\)): suppose that \(h\in \text{Conv }\mathbb {R}^n\) satisfies \(\Im A_0\cap \text{dom }h\neq \emptyset \). Then Theorem 2.1.1 applied to \(g:=h^*\) and \(A:=A_0^*\) gives \((A_0^*h^*)^*=h\circ A_0\); conjugating both sides, we see that the conjugate of \(h\circ A_0\) is the closure of the image-function \(A_0^*h^*\).
In the affine case, consider the function \(h:=g(\cdot +y_0)\in \text{Conv }\mathbb {R}^m\); its conjugate is given by Proposition 1.3.1(v): \(h^*=g^*-\langle y_0,\cdot \rangle _m\). Furthermore, it is clear that
so (2.2.1) follows from the linear case.
Let \(g\in \operatorname {Conv}\mathbb {R}^m\) be such that \(0\in \text{dom }g\) and let \(A_0\) be linear from \(\mathbb {R}^n\) to \(\mathbb {R}^m\). Make the following assumption:
Then \((g\circ A_0)^* = A_0^* g^*\); for every \(s\in \text{dom }(g\circ A_0)^*\), the problem
has at least one optimal solution \(\bar p\) and there holds \((g\circ A_0)^*(s)=A_0^*g^*(s)=g^*(\bar p)\).
To prove \((g\circ A_0)^* = A_0^* g^*\), we have to prove that \(A_0^* g^*\) is a closed function, i.e. that its sublevel-sets are closed (Definition B.1.2.3).
Thus, for given \(r\in \mathbb {R}\), take a sequence \((s_k)\) converging to some \(s\) and such that
Take also \(\delta _k\downarrow 0\); from the definition of the image-function, we can find \(p_k\in \mathbb {R}^m\) such that
Let \(q_k\) be the orthogonal projection of \(p_k\) onto the subspace \(V := \operatorname {lin}\text{dom }g - \Im A_0\). Since \(V\) contains \(\operatorname {lin}\text{dom }g\), Proposition 1.3.4 (with \(z=0\)) gives \(g^*(p_k) = g^*(q_k)\). Furthermore, \(V^\perp = (\text{lin }\text{dom }g)^\perp \cap \text{Ker }A_0^*\); in particular, \(q_k - p_k \in \text{Ker }A_0^*\). In summary, we have singled out \(q_k\in V\) such that
Suppose we can bound \(q_k\). Extracting a subsequence if necessary, we will have \(q_k \to \bar q\) and, passing to the limit, we will obtain (since \(g^*\) is l.s.c)
The required closedness property \(A_0^* g^*(\bar q) \le r\) will follow by definition. Furthermore, this \(\bar q\) will be a solution of (2.2.2) in the particular case \(s_k \equiv s\) and \(r = (A_0^* g^*)(s)\). In this case, \((q_k)\) will be actually a minimizing sequence of (2.2.2).
To prove boundedness of \(q_k\), use the assumption: for some \(\varepsilon {\gt}0\), \(B_m(0,\varepsilon )\cap V\) is included in \(\text{dom }g - \Im A\). Thus, for arbitrary \(z\in B_m(0,\varepsilon )\cap V\), we can find \(y\in \text{dom }g\) and \(x\in \mathbb {R}^n\) such that \(z = y - A_0 x\). Then
We conclude that \(\sup \{ \langle q_k, z\rangle : k=1,2,\dots \} \) is bounded for any \(z\in B_m(0,\varepsilon )\cap V\), which implies that \(q_k\) is bounded; this is Proposition V.2.1.3 in the vector space \(V\).
Take \(g\in \operatorname {Conv}\mathbb {R}^m\), \(A_0\) linear from \(\mathbb {R}^n\) to \(\mathbb {R}^m\) and consider the affine operator \(A(x):=A_0x+y_0\in \mathbb {R}^m\). Make the following assumption:
Then, for every \(s\in \operatorname {dom}(g\circ A_0)^*\), the problem
has at least one optimal solution \(\bar p\) and there holds \((g\circ A)^*(s)=g^*(\bar p)-\langle \bar p,y_0\rangle \).
By assumption, we can choose \(\bar x \in \mathbb {R}^n\) such that \(\bar y := A(\bar x)\in \operatorname {ri}\operatorname {dom} g\). Consider the function \(\bar g:=g(\bar y+\cdot )\in \operatorname {Conv}\mathbb {R}^m\). Observing that
we obtain from the calculus rule 1.3.1(v): \((g\circ A)^*=(\bar g\circ A_0)^*+\langle \cdot ,\bar x\rangle \).
Then Lemma 2.2.2 allows the computation of this conjugate. We have \(0\) in the domain of \(\bar g\), and even in its relative interior:
We can therefore write: for all \(s\in \operatorname {dom}(\bar g\circ A_0)^* [=\operatorname {dom}(g\circ A)^*]\),
where the minimum is attained at some \(\bar p\). Using again the calculus rule 1.3.1(v) and various relations from above, we have established
5.2.3 Sum of two functions
Let \(g_1,g_2\) be in \(\operatorname {Conv}\mathbb {R}^n\) and assume that \(\operatorname {dom}g_1\cap \operatorname {dom}g_2\neq \varnothing \). The conjugate \((g_1+g_2)^*\) of their sum is the closure of the convex function \(g_1^*\mathbin {\square }g_2^*\).
Call \(f_i^*:=g_i\), for \(i=1,2\); apply Corollary 2.1.3: \((g_1^*\mathbin {\square }g_2^*)^*=g_1+g_2\); then take the conjugate again.
Let \(g_1,g_2\) be in \(\text{Conv }\mathbb {R}^n\) and assume that
Then \((g_1+g_2)^* = g_1^*\mathbin {\square }g_2^*\) and, for every \(s\in \text{dom }(g_1+g_2)^*\), the problem
has at least one optimal solution \((\bar p,\bar q)\), which therefore satisfies
Define \(g\in \text{Conv }(\mathbb {R}^n\times \mathbb {R}^n)\) by \(g(x_1,x_2):=g_1(x_1)+g_2(x_2)\) and the linear operator \(A:\mathbb {R}^n\to \mathbb {R}^n\times \mathbb {R}^n\) by \(Ax:=(x,x)\). Then \(g\circ A=g_1+g_2\), and we proceed to use Theorem 2.2.3. As seen in Proposition 1.3.1(ix), \(g^*(p,q)=g_1^*(p)+g_2^*(q)\) and straightforward calculation shows that \(A^*(p,q)=p+q\). Thus, if we can apply Theorem 2.2.3, we can write
and the above minimization problem does have an optimal solution.
To check (2.2.4), note that \(\text{dom }g=\text{dom }g_1\times \text{dom }g_2\), and \(\Im A\) is the diagonal set \(\Delta :=\{ (s,s):s\in \Bbb R^n\} \). We have
(Proposition A.2.1.11), and this just means that \(\Im A=\Delta \) has a nonempty intersection with \(\text{ri }\text{dom }g\).
Corollary 2.3.3 Take \(f_1\) and \(f_2\) in \(\operatorname {Conv}\mathbb {R}^n\), with \(f_1\) \(0\)-coercive and \(f_2\) bounded from below. Then the inf-convolution problem (2.1.3) has a nonempty compact set of solutions; furthermore \(f_1\mathbin {\square }f_2\in \operatorname {Conv}\mathbb {R}^n\).
Letting \(\mu \) denote a lower bound for \(f_2\), we have \(f_1(x_1)+f_2(x-x_1)\ge f_1(x_1)+\mu \) for all \(x_1\in \mathbb {R}^n\), and the first part of the claim follows.
For closedness of the infimal convolution, we set \(g_i:=f_i^*\), \(i=1,2\); because of \(0\)-coercivity of \(f_1\), \(0\in \operatorname {int}\text{dom }g_1\) (Remark 1.3.10), and \(g_2(0)\le -\mu \). Thus, we can apply Theorem 2.3.2 with the qualification assumption (2.3.Q.ij’).
5.2.4 Infima and suprema
Let \(\{ f_j\} _{j\in J}\) be a collection of functions satisfying (1.1.1) and having a common affine minorant: \(\sup _{j\in J} f_j^*(s) {\lt} +\infty \) for some \(s\in \mathbb {R}^n\). Then their infimum \(f := \inf _{j\in J} f_j\) satisfies (1.1.1), and its conjugate is the supremum of the \(f_j^*\)’s:
By definition, for all \(s\in \mathbb {R}^n\)
Let \(\{ g_j\} _{j\in J}\) be a collection of functions in \(\text{Conv }\mathbb {R}^n\). If their supremum \(g:=\sup _{j\in J} g_j\) is not identically \(+\infty \), it is in \(\text{Conv }\mathbb {R}^n\), and its conjugate is the closed convex hull of the \(g_j^*\)’s:
Call \(f_j:=g_j^*\), hence \(f_j^*=g_j\), and \(g\) is nothing but the \(f^*\) of (2.4.1). Taking the conjugate of both sides, the result follows from (1.3.4).
5.2.5 Post-composition with an increasing convex function
With \(f\) and \(g\) defined as above, assume that \(f(\mathbb R^n)\cap \operatorname {int}\operatorname {dom}g\neq \varnothing \). For all \(s\in \operatorname {dom}(g\circ f)^*\), define the function \(\psi _s\in \operatorname {Conv}\mathbb R\) by
Then \((g\circ f)^*(s)=\min _{\alpha \in \mathbb R}\psi _s(\alpha )\).
By definition,
[ \(g\) is increasing]
We must compute the conjugate of the sum of the two functions \(f_1(x,r):=g(r)-\langle s,x\rangle \) and \(f_2:=\iota _{\operatorname {epi}f}\), at the obvious argument \((x,r)\in \mathbb R^n\times \mathbb R\). We have \(\operatorname {dom}f_1=\mathbb R^n\times \operatorname {dom}g\), so that \(\operatorname {dom}f_1=\mathbb R^n\times \operatorname {dom}g\); hence, by assumption:
Theorem 2.3.2, more precisely Fenchel’s duality theorem (2.3.2), can be applied with the qualification condition (2.3.Q.jj’):
The computation of the above two conjugates is straightforward and gives
where the second equality comes from (1.2.2).
5.3 Various Examples
5.3.1 The Cramer transform
5.3.2 The conjugate of a convex partially quadratic function
The function \(g\) of (3.2.1) has the conjugate
where \(p_{H}\) is the operator of orthogonal projection onto \(H\) and \((\cdot )^{-}\) is the Moore–Penrose pseudo-inverse.
Set \(f:=\tfrac {1}{2}\langle B\cdot ,\cdot \rangle \), so that \(g=f+i_{H}\) and \(g^{*}=(f\circ p_{H})^{*}\circ p_{H}\) (Proposition 1.3.2). Knowing that \((f\circ p_{H})(x)=\tfrac {1}{2}\langle (p_{H}\circ B\circ p_{H})x,x\rangle \), we obtain from Example 1.1.4 the conjugate \(g^{*}(s)\) under the form
It could be checked directly that \(\operatorname {Im}(p_{H}\circ B\circ p_{H})+H^{\perp }=\operatorname {Im}B+H^{\perp }\). A simpler argument, however, is obtained via Theorem 2.3.2, which can be applied since \(\operatorname {dom}f=\mathbb {R}^{n}\). Thus,
which shows that \(\operatorname {dom}g^{*}=\operatorname {Im}B+H^{\perp }\).
5.3.3 Polyhedral functions
At each \(s\in \operatorname {co}\{ s_1,\ldots ,s_k\} =\operatorname {dom}f^*\), the conjugate of \(f\) has the value ( \(\Delta _k\) is the unit simplex)
Set \(g_i(s):= \iota _{\{ s_i\} }+b_i\) and
Apply Proposition B.2.5.4 to see that \(\overline{\operatorname {co}}\, g=f^*\) of (3.3.2). The rest follows from Theorem 2.4.1 or 2.4.4, with a notational flip of \(f\) and \(g\). □
5.4 Differentiability of a Conjugate Function
5.4.1 First-order differentiability
Let \(f\in \operatorname {Conv}\mathbb {R}^n\) be strictly convex. Then \(\operatorname {int}\text{dom }f^*\neq \varnothing \) and \(f^*\) is continuously differentiable on \(\operatorname {int}\text{dom }f^*\).
For arbitrary \(x_0\in \text{dom }f\) and nonzero \(d\in \mathbb {R}^n\), consider Example 2.4.6. Strict convexity of \(f\) implies that
and this inequality extends to the suprema: \(0{\lt}f'_\infty (-d)+f'_\infty (d)\). Remembering that \(f'_\infty =\sigma _{\text{dom }f^*}\) (Proposition 1.2.2), this means
i.e. \(\text{dom }f^*\) has a positive breadth in every nonzero direction \(d\): its interior is nonempty—Theorem C2.2.3(iii).
Now suppose that there is some \(s\in \int \text{dom }f^*\) such that \(\partial f^*(s)\) contains two distinct points \(x_1\) and \(x_2\). Then \(s\in \partial f(x_1)\cap \partial f(x_2)\); by convex combination of the relations
we deduce, using Fenchel’s inequality (1.1.3):
which implies that \(f\) is affine on \([x_1,x_2]\), a contradiction. In other words, \(\partial f^*\) is single-valued on \(\int \text{dom }f^*\), and this means that \(f^*\) is continuously differentiable there (Remark 6.2.6).
Let \(f\in \operatorname {Conv}\mathbb {R}^n\) be differentiable on the set \(\Omega :=\operatorname {int}\operatorname {dom}f\). Then \(f^*\) is strictly convex on each convex subset \(C\subset \nabla f(\Omega )\).
Let \(C\) be a convex set as stated. Suppose that there are two distinct points \(s_1\) and \(s_2\) in \(C\) such that \(f^*\) is affine on the line-segment \([s_1,s_2]\). Then, setting \(s:=\tfrac 12(s_1+s_2)\in C\subset \nabla f(\Omega )\), there is \(x\in \Omega \) such that \(\nabla f(x)=s\), i.e. \(x\in \partial f^*(s)\). Using the affine character of \(f^*\), we have
and, in view of Fenchel’s inequality (1.1.3), this implies that each term in the bracket is \(0\): \(x\in \partial f^*(s_1)\cap \partial f^*(s_2)\), i.e. \(\partial f(x)\) contains the two points \(s_1\) and \(s_2\), a contradiction to the existence of \(\nabla f(x)\).
Let \(f:\mathbb {R}^n\to \mathbb {R}\) be strictly convex, differentiable and 1-coercive. Then
\(f^{*}\) is also finite-valued on \(\mathbb {R}^n\), strictly convex, differentiable and 1-coercive;
the continuous mapping \(\nabla f\) is one-to-one from \(\mathbb {R}^n\) onto \(\mathbb {R}^n\), and its inverse is continuous;
5.4.2 Lipschitz continuity of the gradient mapping
Assume that \(f:\mathbb {R}^n\to \mathbb {R}\) is strongly convex with modulus \(c{\gt}0\) on \(\mathbb {R}^n\): for all \((x_1,x_2)\in \mathbb {R}^n\times \mathbb {R}^n\) and \(\alpha \in ]0,1[\),
Then \(\operatorname {dom} f^*=\mathbb {R}^n\) and \(\nabla f^*\) is Lipschitzian with constant \(1/c\) on \(\mathbb {R}^n\):
We use the various equivalent definitions of strong convexity (see Theorem D.6.1.2). Fix \(x_0\) and \(s_0\in \partial f(x_0)\): for all \(0\neq d\in \mathbb {R}^n\) and \(t\ge 0\)
hence \(f'_+(d)=\sigma _{\operatorname {dom} f^*}(d)=+\infty \), i.e. \(\operatorname {dom} f^*=\mathbb {R}^n\). Also, \(f\) is in particular strictly convex, so we know from Theorem 4.1.1 that \(f^*\) is differentiable (on \(\mathbb {R}^n\)). Finally, strong convexity of \(f\) can also be written \((s_1-s_2,x_1-x_2)\ge c\| x_1-x_2\| ^2\), in which we have \(s_i\in \partial f(x_i)\), i.e. \(s_i=\nabla f^*(s_i)\), for \(i=1,2\). The rest follows from the Cauchy–Schwarz inequality.
Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex and have a gradient-mapping Lipschitzian with constant \(L{\gt}0\) on \(\mathbb {R}^n\): for all \((x_1,x_2)\in \mathbb {R}^n\times \mathbb {R}^n\),
Then \(f^*\) is strongly convex with modulus \(1/L\) on each convex subset \(C\subset \operatorname {dom}\partial f^*\). In particular, there holds for all \((x_1,x_2)\in \mathbb {R}^n\times \mathbb {R}^n\)
Let \(s_1\) and \(s_2\) be arbitrary in \(\text{dom }\partial f^* \subset \text{dom }f^*\); take \(s\) and \(s'\) on the segment \([s_1,s_2]\). To establish the strong convexity of \(f^*\), we need to minorize the remainder term \(f^*(s')-f^*(s)-\langle x,s'-s\rangle \), with \(x\in \partial f^*(s)\). For this, we minorize \(f^*(s')=\sup _y\{ \langle s',y\rangle -f(y)\} \), i.e. we majorize \(f(y)\):
(we have used the property \(\int _0^1 t\, dt=1/2\), as well as \(x\in \partial f^*(s)\), i.e. \(\nabla f(x)=s\)). In summary, we have
Observe that the last supremum is nothing but the value at \(s'-s\) of the conjugate of \(\tfrac 12 L\| \, \cdot \, \| ^2\). Using the calculus rule 1.3.1, we have therefore proved
for all \(s,s'\) in \([s_1,s_2]\) and all \(x\in \partial f^*(s)\). Replacing \(s'\) in (4.2.3) by \(s_1\) and by \(s_2\), and setting \(s=\alpha s_1+(1-\alpha )s_2\), the strong convexity (4.2.1) for \(f^*\) is established by convex combination.
On the other hand, replacing \((s,s')\) by \((s_1,s_2)\) in (4.2.3):
Then, replacing \((s,s')\) by \((s_2,s_1)\) and summing: \(\langle x_1-x_2,s_1-s_2\rangle \ge \tfrac 1L\| s_1-s_2\| ^2\). In view of the differentiability of \(f\), this is just (4.2.2), which has to hold for all \((x_1,x_2)\) simply because \(\Im \partial f^*=\text{dom }\nabla f=\mathbb R^n\).