Convex Eval

5 Conjugacy in Convex Analysis

5.1 The Convex Conjugate of a Function

5.1.1 Definition and first examples

Definition 5.1.1
#

The conjugate of a function \(f\) satisfying (1.1.1) is the function \(f^{*}\) defined by

\[ \mathbb {R}^{n}\ni s\mapsto f^{*}(s):=\sup \{ \langle s,x\rangle -f(x):x\in \operatorname {dom}f\} . \tag {1.1.2} \]

For simplicity, we may also let \(x\) run over the whole space instead of \(\operatorname {dom}f\).

The mapping \(f\mapsto f^{*}\) will often be called the conjugacy operation, or the Legendre–Fenchel transform.

Theorem 5.1.1
#

For \(f\) satisfying (1.1.1), the conjugate \(f^*\) is a closed convex function: \(f^*\in \text{Conv }\mathbb {R}^n\).

Proof

See Example B.2.1.3.

5.1.2 Interpretations

Proposition 5.1.1
#

There holds for all \(x\in \mathbb {R}^n\)

\[ f^*(s)=\sigma _{\mathrm{epi}\, f}(s,-1)=\sup \{ \langle s,x\rangle - r : (x,r)\in \mathrm{epi}\, f\} . \tag {1.2.1} \]

It follows that the support function of \(\mathrm{epi}\, f\) has the expression

\[ \sigma _{\mathrm{epi}\, f}(s,-u)= \begin{cases} u\, f^*\! \bigl(\tfrac {1}{u}s\bigr) & \text{if } u{\gt}0,\\[4pt] \sigma _{\mathrm{epi}\, f}(s,0)=\sigma _{\mathrm{dom}\, f}(s) & \text{if } u=0,\\[4pt] +\infty & \text{if } u{\lt}0. \end{cases} \tag {1.2.2} \]
Proof

In (1.2.1), the right-most term can be written

\[ \sup _x\; \sup _{r\ge f(x)}\bigl[\langle s,x\rangle - r\bigr] = \sup _x\bigl[\langle s,x\rangle - f(x)\bigr] \]

and the first equality is established. As for (1.2.2), the case \(u{\lt}0\) is trivial; when \(u{\gt}0\), use the positive homogeneity of support functions to get

\[ \sigma _{\operatorname {epi} f}(s,-u)=u\sigma _{\operatorname {epi} f}\bigl(\tfrac {1}{u}s,-1\bigr)=u f^*\bigl(\tfrac {1}{u}s\bigr). \]

Finally, for \(u=0\), we have by definition

\[ \sigma _{\operatorname {epi} f}(s,0)=\sup \{ \langle s,x\rangle : (x,r)\in \operatorname {epi} f\text{ for some }r\in \mathbb {R}\} , \]

and we recognize \(\sigma _{\operatorname {dom} f}(s)\).

Proposition 5.1.2
#

For \(f\in \overline{\operatorname {Conv}}\mathbb {R}^n\),

\[ \sigma _{\operatorname {dom} f}(s)=\sigma _{\operatorname {epi} f}(s,0)=(f^*)^\infty (s)\quad \text{for all }s\in \mathbb {R}^n. \tag {1.2.3} \]
Proof

Use direct calculations; or see Proposition B.2.2.2 and the calculations in Example B.3.2.3.

5.1.3 First properties

Proposition 5.1.3
#

The function \(f\) is assumed to satisfy (1.1.1). The conjugate of the function \(g(x):=f(x)+r\) is \(g^*(s)=f^*(s)-r\).

Proposition 5.1.4
#

The function \(f\) is assumed to satisfy (1.1.1). With \(t{\gt}0\), the conjugate of the function \(g(x):=tf(x)\) is \(g^*(s)=t f^*(s/t)\).

Proposition 5.1.5
#

The function \(f\) is assumed to satisfy (1.1.1). With \(t\neq 0\), the conjugate of the function \(g(x):=f(tx)\) is \(g^*(s)=f^*(s/t)\).

Proposition 5.1.6
#

The function \(f\) is are assumed to satisfy (1.1.1). More generally: if \(A\) is an invertible linear operator, \((f\circ A)^*=f^*\circ (A^{-1})^*\).

Proposition 5.1.7
#

The function \(f\) is assumed to satisfy (1.1.1). The conjugate of the function \(g(x):=f(x-x_{0})\) is \(g^{*}(s)=f^{*}(s)+\langle s,x_{0}\rangle \).

Proposition 5.1.8
#

The function \(f\) is assumed to satisfy (1.1.1). The conjugate of the function \(g(x):=f(x)+\langle s_{0},x\rangle \) is \(g^{*}(s)=f^{*}(s-s_{0})\).

Proposition 5.1.9
#

The functions \(f_j\) appearing below are assumed to satisfy (1.1.1). If \(f_{1}\le f_{2}\), then \(f_{1}^{*}\ge f_{2}^{*}\).

Proposition 5.1.10
#

The functions \(f_j\) appearing below are assumed to satisfy (1.1.1). “Convexity” of the conjugation: if \(\mathrm{dom}\, f_{1}\cap \mathrm{dom}\, f_{2}\neq \varnothing \) and \(\alpha \in ]0,1[\),

\[ [\alpha f_{1}+(1-\alpha )f_{2}]^{*}\le \alpha f_{1}^{*}+(1-\alpha )f_{2}^{*}; \]
Proposition 5.1.11
#

Let \(f\) satisfy (1.1.1), let \(H\) be a subspace of \(\mathbb {R}^n\), and call \(p_H\) the operator of orthogonal projection onto \(H\). Suppose that there is a point in \(H\) where \(f\) is finite. Then \(f+p_{iH}\) satisfies (1.1.1) and its conjugate is

\[ (f+p_{iH})^*=(f\circ p_H)^*\circ p_H . \tag {1.3.1} \]
Proof

When \(y\) describes \(\mathbb {R}^n\), \(p_H y\) describes \(H\) so we can write, knowing that \(p_H\) is symmetric:

\[ (f+p_{iH})^*(s):=\sup \{ \langle s,x\rangle -f(x):x\in H\} =\sup \{ \langle s,p_H y\rangle -f(p_H y):y\in \mathbb {R}^n\} =\sup \{ \langle p_H s,y\rangle -f(p_H y):y\in \mathbb {R}^n\} . \]
Proposition 5.1.12
#

For \(f\) satisfying (1.1.1), let a subspace \(V\) contain the subspace parallel to \(\operatorname {aff\, dom}f\) and set \(U:=V^{\perp }\). For any \(z\in \operatorname {aff\, dom}f\) and any \(s\in \mathbb {R}^{n}\) decomposed as \(s=s_{U}+s_{V}\), there holds

\[ f^{*}(s)=\langle s_{U},z\rangle +f^{*}(s_{V}). \]
Proof

In (1.1.2), the variable \(x\) can range through \(z+V\supset \operatorname {aff\, dom}f\):

\[ \begin{aligned} f^{*}(s)& =\sup _{v\in V}[\langle s_{U}+s_{V},z+v\rangle -f(z+v)]\\ & =\langle s_{U},z\rangle +\sup _{v\in V}[\langle s_{V},z+v\rangle -f(z+v)]\\ & =\langle s_{U},z\rangle +f^{*}(s_{V}). \end{aligned} \]
Theorem 5.1.2
#

For \(f\) satisfying (1.1.1), the function \(f^{**}\) of (1.3.2) is the pointwise supremum of all the affine functions on \(\mathbb {R}^n\) majorized by \(f\). In other words

\[ \operatorname {epi} f^{**}=\overline{\operatorname {co}}\bigl(\operatorname {epi} f\bigr). \tag {1.3.3} \]
Proof

Call \(\Sigma \subset \mathbb {R}^n\times \mathbb {R}\) the set of pairs \((s,r)\) defining affine functions \(\langle s,\cdot \rangle - r\) majorized by \(f\):

\[ (s,r)\in \Sigma \iff f(x)\ge \langle s,x\rangle - r\quad \text{for all }x\in \mathbb {R}^n \]
\[ \iff r\ge \sup \{ \, \langle s,x\rangle - f(x):x\in \mathbb {R}^n\, \} \]
\[ \iff r\ge f^*(s)\qquad (\text{and }s\in \operatorname {dom}f^*!). \]

Then we obtain, for \(x\in \mathbb {R}^n\),

\[ \sup _{(s,r)\in \Sigma }\{ \langle s,x\rangle - r\} = \sup \{ \langle s,x\rangle - r : s\in \operatorname {dom}f^*,\ -r\le -f^*(s)\} \]
\[ = \sup \{ \langle s,x\rangle - f^*(s): s\in \operatorname {dom}f^*\} =f^{**}(x). \]

Geometrically, the epigraphs of the affine functions associated with \((s,r)\in \Sigma \) are the (non-vertical) closed half-spaces containing \(\operatorname {epi} f\). From §B.2.5, the epigraph of their supremum is the closed convex hull of \(\operatorname {epi} f\), and this proves (1.3.3).

Corollary 5.1.1
#

If \(g\) is a function satisfying \(\overline{\operatorname {co}}\, f \le g \le f\), then \(g^* = f^*\). The function \(f\) is equal to its biconjugate \(f^{**}\) if and only if \(f\in \overline{\operatorname {Conv}}\mathbb {R}^n\).

Proof

Immediate.

Definition 5.1.2
#

A function \(f\) satisfying (1.1.1) is said to be coercive [resp. 1-coercive] when

\[ \lim _{\| x\| \to +\infty } f(x)=+\infty \qquad \bigl[\text{resp. }\lim _{\| x\| \to +\infty }\frac{f(x)}{\| x\| }=+\infty \bigr]. \]
Proposition 5.1.13
#

If \(f\) satisfying (1.1.1) is 1-coercive, then \(f^*(s) {\lt} +\infty \) for all \(s \in \mathbb {R}^n\).

Proof

Let \(s\) be given. The 1-coercivity of \(f\) implies the existence of a number \(R\) such that \(f(x) \ge \| s\| \, \| x\| \) (hence \(\langle s,x\rangle - f(x) \le 0\)) whenever \(\| x\| \ge R\). As a result, we have in (1.1.2)

\[ \sup \{ \langle s,x\rangle - \| s\| \, \| x\| : \| x\| \ge R\} \le 0. \]

On the other hand, (1.1.1) implies an upper bound

\[ \sup \{ \langle s,x\rangle - f(x) : \| x\| \le R\} \le M . \]

Altogether, \(f^*(s) \le \max \{ 0,M\} \).

Proposition 5.1.14
#

For \(f\) satisfying (1.1.1), the following holds:

  1. If \(x_0\in \operatorname {int}\text{dom }f\) then \(f^*-\langle x_0,\cdot \rangle \) is \(0\)-coercive;

  2. in particular, if \(f\) is finite over \(\mathbb R^n\), then \(f^*\) is \(1\)-coercive.

Proof

We know from (1.2.3) that \(\sigma _{\text{dom }f}=(f^*)^{\vee }_{\infty }\) so, using Theorem C.2.2.3(iii), \(x_0\in \operatorname {int}\text{dom }f\subset \int (\text{codom }f)\) implies \((f^*)^{\vee }_{\infty }(s)-\langle x_0,s\rangle {\gt}0\) for all \(s\neq 0\). By virtue of Proposition B.3.2.4, this means exactly that \(f^*-\langle x_0,\cdot \rangle \) has compact sublevel-sets; (i) is proved.

Then, as demonstrated in Definition B.3.2.5, \(0\)-coercivity of \(f^*-\langle x_0,\cdot \rangle \) for all \(x_0\) means \(1\)-coercivity of \(f^*\).

5.1.4 Subdifferentials and extended-valued functions

Theorem 5.1.3
#

For \(f\) satisfying (1.1.1) and \(\partial f\) defined by (1.4.1), \(s\in \partial f(x)\) if and only if

\[ f^*(s)+f(x)-\langle s,x\rangle =0\quad (\text{or }\le 0). \tag {1.4.2} \]
Proof

To say that \(s\) lies in the set (1.4.1) is to say that

\[ \langle s,y\rangle -f(y)\le \langle s,x\rangle -f(x)\quad \text{for all }y\in \text{dom }f, \]

i.e. \(f^*(s)\le \langle s,x\rangle -f(x)\); but this is indeed an equality, in view of Fenchel’s inequality (1.1.3).

Theorem 5.1.4
#

Let \(f\in \operatorname {Conv}\mathbb {R}^n\). Then \(\partial f(x)\neq \varnothing \) whenever \(x\in \operatorname {ri}\text{dom }f\).

Proof

This is Proposition B.1.2.1.

Proposition 5.1.15
#

For \(f\) satisfying (1.1.1), the following properties hold:

\[ \partial f(x)\neq \varnothing \quad \Longrightarrow \quad (\overline{\operatorname {co}}\, f)(x)=f(x);\tag {1.4.3} \]
\[ \overline{\operatorname {co}}\, f\le g\le f\ \text{ and }\ g(x)=f(x) \quad \Longrightarrow \quad \partial g(x)=\partial f(x);\tag {1.4.4} \]
\[ s\in \partial f(x)\quad \Longrightarrow \quad x\in \partial f^*(s).\tag {1.4.5} \]
Proof

Let \(s\) be a subgradient of \(f\) at \(x\). From the definition (1.4.1) itself, the function \(y\mapsto \ell _s(y):=f(x)+\langle s,y-x\rangle \) is affine and minorizes \(f\), hence \(\ell _s\le \overline{\operatorname {co}}\, f\le f\); because \(\ell _s(x)=f(x)\), this implies (1.4.3).

Now, \(s\in \partial f(x)\) if and only if (1.4.2) holds. From our assumption in (1.4.4), \(f^*=g^*=(\overline{\operatorname {co}}\, f)^*\) (Corollary 1.3.6) and \(g(x)=f(x)\). Therefore

\[ s\in \partial f(x)\iff g^*(s)+g(x)-\langle s,x\rangle =0, \]

which expresses exactly that \(s\in \partial g(x)\); (1.4.4) is proved.

Finally, we know that \(f^{**}=\overline{\operatorname {co}}\, f\le f\); so, when \(s\) satisfies (1.4.2), we have

\[ f^*(s)+f^{**}(x)-\langle s,x\rangle = f^*(s)+(\overline{\operatorname {co}}\, f)(x)-\langle s,x\rangle \le 0, \]

which means \(x\in \partial f^*(s)\): we have just proved (1.4.5).

Corollary 5.1.2
#

If \(f\in \operatorname {Conv}\mathbb {R}^n\), the following equivalences hold:

\[ f(x)+f^*(s)-\langle s,x\rangle =0\ (\text{or }\le 0) \quad \Longleftrightarrow \quad s\in \partial f(x) \quad \Longleftrightarrow \quad x\in \partial f^*(s). \]
Proof

This is a rewriting of Theorem 1.4.1, taking into account (1.4.5) and the symmetric role played by \(f\) and \(f^*\) when \(f\in \operatorname {Conv}\mathbb {R}^n\).

5.2 Conjugacy Rules on the Conjugacy Operation

5.2.1 Image of a function under a linear mapping

Theorem 5.2.1
#

With the above notation, assume that \(\operatorname {Im}A^* \cap \text{dom }g^* \neq \varnothing \). Then \(Ag\) satisfies (1.1.1) and its conjugate is

\[ (Ag)^* = g^* \circ A^* . \]
Proof

First, it is clear that \(Ag \not\equiv +\infty \) (take \(x=Ay\), with \(y\in \text{dom }g\)). On the other hand, our assumption implies the existence of some \(p_0 = A^*s_0\) such that \(g^*(p_0){\lt}+\infty \); with Fenchel’s inequality (1.1.3), we have for all \(y\in \mathbb R^m\):

\[ g(y) \ge \langle A^*s_0,y\rangle _m - g^*(p_0) = \langle s_0,Ay\rangle _n - g^*(p_0). \]

For each \(x\in \mathbb R^n\), take the infimum over those \(y\) satisfying \(Ay=x\): the affine function \(\langle s_0,\cdot \rangle - g^*(p_0)\) minorizes \(Ag\). Altogether, \(Ag\) satisfies (1.1.1).

Then we have for \(s\in \mathbb R^n\)

\[ (Ag)^*(s) = \sup _{x\in \mathbb R^n}[\langle s,x\rangle - \inf _{Ay=x} g(y)] = \sup _{x\in \mathbb R^n,\, Ay=x}[\langle s,x\rangle - g(y)] = \sup _{y\in \mathbb R^m}[\langle s,Ay\rangle - g(y)] = g^*(A^*s). \]
Corollary 5.2.1

With \(g:\mathbb {R}^n\times \mathbb {R}^p=\mathbb {R}^m\to \mathbb {R}\cup \{ +\infty \} \) not identically \(+\infty \), let \(g^*\) be associated with a scalar product preserving the structure of \(\mathbb {R}^m\) as a product space: \(\langle \cdot ,\cdot \rangle _m=\langle \cdot ,\cdot \rangle _n+\langle \cdot ,\cdot \rangle _p\). If there exists \(s_0\in \mathbb {R}^n\) such that \((s_0,0)\in \operatorname {dom} g^*\), then the conjugate of \(f\) defined by (2.1.2) is

\[ f^*(s)=g^*(s,0)\qquad \text{for all }s\in \mathbb {R}^n. \]
Proof

It suffices to observe that, \(A\) being the projection defined above, there holds for all \(y_1=(x_1,z_1)\in \mathbb {R}^{m}\) and \(x_2\in \mathbb {R}^n\),

\[ \langle Ay_1,x_2\rangle _n=\langle x_1,x_2\rangle _n=\langle x_1,x_2\rangle _n+\langle z_1,0\rangle _p=\langle y_1,(x_2,0)\rangle _m, \]

which defines the adjoint \(A^*x=(x,0)\) for all \(x\in \mathbb {R}^n\). Then apply Theorem 2.1.1.

Corollary 5.2.2

Let \(f_1\) and \(f_2\) be two functions from \(\mathbb {R}^n\) to \(\mathbb {R}\cup \{ +\infty \} \), not identically \(+\infty \), and satisfying \(\operatorname {dom} f_1^*\cap \operatorname {dom} f_2^*\neq \varnothing \). Then their inf-convolution satisfies (1.1.1), and \((f_1\mathbin {\square }f_2)^*=f_1^*+f_2^*\).

Proof

Equip \(\mathbb {R}^n\times \mathbb {R}^n\) with the scalar product \(\langle \cdot ,\cdot \rangle +\langle \cdot ,\cdot \rangle \). Using the above notation for \(g\) and \(A\), we have \(g^*(s_1,s_2)=f_1^*(s_1)+f_2^*(s_2)\) (Proposition 1.3.1(ix)) and \(A^*(s)=(s,s)\). Then apply the definitions.

5.2.2 Pre-composition with an affine mapping

Theorem 5.2.2

Take \(g\in \text{Conv }\mathbb {R}^m\), \(A_0\) linear from \(\mathbb {R}^n\) to \(\mathbb {R}^m\) and consider the affine operator \(A(x):=A_0x+y_0\in \mathbb {R}^m\). Suppose that \(A(\mathbb {R}^n)\cap \text{dom }g\neq \emptyset \). Then \(g\circ A\in \text{Conv }\mathbb {R}^n\) and its conjugate is the closure of the convex function

\[ \mathbb {R}^n\ni s\mapsto \inf _p\{ \, g^*(p)-\langle y_0,p\rangle _m : A_0^*p=s\, \} . \tag {2.2.1} \]
Proof

We start with the linear case (\(y_0=0\)): suppose that \(h\in \text{Conv }\mathbb {R}^n\) satisfies \(\Im A_0\cap \text{dom }h\neq \emptyset \). Then Theorem 2.1.1 applied to \(g:=h^*\) and \(A:=A_0^*\) gives \((A_0^*h^*)^*=h\circ A_0\); conjugating both sides, we see that the conjugate of \(h\circ A_0\) is the closure of the image-function \(A_0^*h^*\).

In the affine case, consider the function \(h:=g(\cdot +y_0)\in \text{Conv }\mathbb {R}^m\); its conjugate is given by Proposition 1.3.1(v): \(h^*=g^*-\langle y_0,\cdot \rangle _m\). Furthermore, it is clear that

\[ (g\circ A)(x)=g(A_0x+y_0)=h(A_0x)=(h\circ A_0)(x), \]

so (2.2.1) follows from the linear case.

Lemma 5.2.1

Let \(g\in \operatorname {Conv}\mathbb {R}^m\) be such that \(0\in \text{dom }g\) and let \(A_0\) be linear from \(\mathbb {R}^n\) to \(\mathbb {R}^m\). Make the following assumption:

\[ \Im A_0\cap \text{ri }\text{dom }g\neq \emptyset \quad \text{i.e.}\quad 0\in \text{ri }\text{dom }g-\Im A_0\ [=\text{ri }(\text{dom }g-\Im A_0)]. \]

Then \((g\circ A_0)^* = A_0^* g^*\); for every \(s\in \text{dom }(g\circ A_0)^*\), the problem

\begin{equation} \label{eq:2.2.2} \inf _{p}\{ \, g^*(p):\ A_0^*p=s\, \} \end{equation}
4

has at least one optimal solution \(\bar p\) and there holds \((g\circ A_0)^*(s)=A_0^*g^*(s)=g^*(\bar p)\).

Proof

To prove \((g\circ A_0)^* = A_0^* g^*\), we have to prove that \(A_0^* g^*\) is a closed function, i.e. that its sublevel-sets are closed (Definition B.1.2.3).

Thus, for given \(r\in \mathbb {R}\), take a sequence \((s_k)\) converging to some \(s\) and such that

\[ (A_0^* g^*)(s_k) \le r. \]

Take also \(\delta _k\downarrow 0\); from the definition of the image-function, we can find \(p_k\in \mathbb {R}^m\) such that

\[ g^*(p_k) \le r + \delta _k\quad \text{and}\quad A_0^* p_k = s_k. \]

Let \(q_k\) be the orthogonal projection of \(p_k\) onto the subspace \(V := \operatorname {lin}\text{dom }g - \Im A_0\). Since \(V\) contains \(\operatorname {lin}\text{dom }g\), Proposition 1.3.4 (with \(z=0\)) gives \(g^*(p_k) = g^*(q_k)\). Furthermore, \(V^\perp = (\text{lin }\text{dom }g)^\perp \cap \text{Ker }A_0^*\); in particular, \(q_k - p_k \in \text{Ker }A_0^*\). In summary, we have singled out \(q_k\in V\) such that

\[ g^*(q_k) \le r + \delta _k\quad \text{and}\quad A_0^* q_k = s_k\qquad \text{for all }k. \tag {2.2.3} \]

Suppose we can bound \(q_k\). Extracting a subsequence if necessary, we will have \(q_k \to \bar q\) and, passing to the limit, we will obtain (since \(g^*\) is l.s.c)

\[ g^*(\bar q) \le \liminf g^*(q_k) \le r\quad \text{and}\quad A_0^* \bar q = s. \]

The required closedness property \(A_0^* g^*(\bar q) \le r\) will follow by definition. Furthermore, this \(\bar q\) will be a solution of (2.2.2) in the particular case \(s_k \equiv s\) and \(r = (A_0^* g^*)(s)\). In this case, \((q_k)\) will be actually a minimizing sequence of (2.2.2).

To prove boundedness of \(q_k\), use the assumption: for some \(\varepsilon {\gt}0\), \(B_m(0,\varepsilon )\cap V\) is included in \(\text{dom }g - \Im A\). Thus, for arbitrary \(z\in B_m(0,\varepsilon )\cap V\), we can find \(y\in \text{dom }g\) and \(x\in \mathbb {R}^n\) such that \(z = y - A_0 x\). Then

\begin{align*} \langle q_k, z\rangle _m & = \langle q_k, y\rangle _m - \langle A_0^* q_k, x\rangle _n\\ & \le g(y) + g^*(q_k) - \langle A_0^* q_k, x\rangle _n & & [\text{Fenchel (1.1.3)}]\\ & \le g(y) + r + \delta _k - \langle s_k, x\rangle _n. & & [(2.2.3)] \end{align*}

We conclude that \(\sup \{ \langle q_k, z\rangle : k=1,2,\dots \} \) is bounded for any \(z\in B_m(0,\varepsilon )\cap V\), which implies that \(q_k\) is bounded; this is Proposition V.2.1.3 in the vector space \(V\).

Theorem 5.2.3

Take \(g\in \operatorname {Conv}\mathbb {R}^m\), \(A_0\) linear from \(\mathbb {R}^n\) to \(\mathbb {R}^m\) and consider the affine operator \(A(x):=A_0x+y_0\in \mathbb {R}^m\). Make the following assumption:

\[ A(\mathbb {R}^n)\cap \operatorname {ri}\operatorname {dom}g\neq \varnothing . \tag {2.2.4} \]

Then, for every \(s\in \operatorname {dom}(g\circ A_0)^*\), the problem

\[ \min _p\{ \, g^*(p)-\langle p,y_0\rangle :\ A_0^*p=s\, \} \tag {2.2.5} \]

has at least one optimal solution \(\bar p\) and there holds \((g\circ A)^*(s)=g^*(\bar p)-\langle \bar p,y_0\rangle \).

Proof

By assumption, we can choose \(\bar x \in \mathbb {R}^n\) such that \(\bar y := A(\bar x)\in \operatorname {ri}\operatorname {dom} g\). Consider the function \(\bar g:=g(\bar y+\cdot )\in \operatorname {Conv}\mathbb {R}^m\). Observing that

\[ (g\circ A)(x)=\bar g(A(x)-\bar y)=(\bar g\circ A_0)(x-\bar x), \]

we obtain from the calculus rule 1.3.1(v): \((g\circ A)^*=(\bar g\circ A_0)^*+\langle \cdot ,\bar x\rangle \).

Then Lemma 2.2.2 allows the computation of this conjugate. We have \(0\) in the domain of \(\bar g\), and even in its relative interior:

\[ \operatorname {ri}\operatorname {dom}\bar g=\operatorname {ri}\operatorname {dom} g-\{ \bar y\} \ni 0\in \operatorname {Im}A_0. \]

We can therefore write: for all \(s\in \operatorname {dom}(\bar g\circ A_0)^* [=\operatorname {dom}(g\circ A)^*]\),

\[ (\bar g\circ A_0)^*(s)=\min _{p}\{ \bar g^*(p)\; :\; A_0^*p=s\} , \]

where the minimum is attained at some \(\bar p\). Using again the calculus rule 1.3.1(v) and various relations from above, we have established

\[ (g\circ A)^*(s)-\langle s,\bar x\rangle =\min \{ \bar g^*(p)-\langle p,A_0\bar x+y_0\rangle :\; A_0^*p=s\} \]
\[ =\min \{ g^*(p)-\langle p,y_0\rangle :\; A_0^*p=s\} -\langle s,\bar x\rangle . \]

5.2.3 Sum of two functions

Theorem 5.2.4

Let \(g_1,g_2\) be in \(\operatorname {Conv}\mathbb {R}^n\) and assume that \(\operatorname {dom}g_1\cap \operatorname {dom}g_2\neq \varnothing \). The conjugate \((g_1+g_2)^*\) of their sum is the closure of the convex function \(g_1^*\mathbin {\square }g_2^*\).

Proof

Call \(f_i^*:=g_i\), for \(i=1,2\); apply Corollary 2.1.3: \((g_1^*\mathbin {\square }g_2^*)^*=g_1+g_2\); then take the conjugate again.

Theorem 5.2.5

Let \(g_1,g_2\) be in \(\text{Conv }\mathbb {R}^n\) and assume that

\[ \begin{aligned} & \text{the relative interiors of }\text{dom }g_1\text{ and }\text{dom }g_2\text{ intersect,}\\ & \text{or equivalently: }0\in \text{ri }(\text{dom }g_1-\text{dom }g_2). \end{aligned} \tag {2.3.1} \]

Then \((g_1+g_2)^* = g_1^*\mathbin {\square }g_2^*\) and, for every \(s\in \text{dom }(g_1+g_2)^*\), the problem

\[ \inf \{ g_1^*(p)+g_2^*(q)\; :\; p+q=s\} \]

has at least one optimal solution \((\bar p,\bar q)\), which therefore satisfies

\[ g_1^*(\bar p)+g_2^*(\bar q)=(g_1^*\mathbin {\square }g_2^*)(s)=(g_1+g_2)^*(s). \]
Proof

Define \(g\in \text{Conv }(\mathbb {R}^n\times \mathbb {R}^n)\) by \(g(x_1,x_2):=g_1(x_1)+g_2(x_2)\) and the linear operator \(A:\mathbb {R}^n\to \mathbb {R}^n\times \mathbb {R}^n\) by \(Ax:=(x,x)\). Then \(g\circ A=g_1+g_2\), and we proceed to use Theorem 2.2.3. As seen in Proposition 1.3.1(ix), \(g^*(p,q)=g_1^*(p)+g_2^*(q)\) and straightforward calculation shows that \(A^*(p,q)=p+q\). Thus, if we can apply Theorem 2.2.3, we can write

\[ (g_1+g_2)^*(s)=(g\circ A)^*(s)=(A^*g^*)(s) =\inf _{p,q}\{ g_1^*(p)+g_2^*(q)\; :\; p+q=s\} =(g_1^*\mathbin {\square }g_2^*)(s) \]

and the above minimization problem does have an optimal solution.

To check (2.2.4), note that \(\text{dom }g=\text{dom }g_1\times \text{dom }g_2\), and \(\Im A\) is the diagonal set \(\Delta :=\{ (s,s):s\in \Bbb R^n\} \). We have

\[ (x,x)\in \text{ri }\text{dom }g_1\times \text{ri }\text{dom }g_2=\text{ri }(\text{dom }g_1\times \text{dom }g_2) \]

(Proposition A.2.1.11), and this just means that \(\Im A=\Delta \) has a nonempty intersection with \(\text{ri }\text{dom }g\).

Corollary 2.3.3 Take \(f_1\) and \(f_2\) in \(\operatorname {Conv}\mathbb {R}^n\), with \(f_1\) \(0\)-coercive and \(f_2\) bounded from below. Then the inf-convolution problem (2.1.3) has a nonempty compact set of solutions; furthermore \(f_1\mathbin {\square }f_2\in \operatorname {Conv}\mathbb {R}^n\).

Proof

Letting \(\mu \) denote a lower bound for \(f_2\), we have \(f_1(x_1)+f_2(x-x_1)\ge f_1(x_1)+\mu \) for all \(x_1\in \mathbb {R}^n\), and the first part of the claim follows.

For closedness of the infimal convolution, we set \(g_i:=f_i^*\), \(i=1,2\); because of \(0\)-coercivity of \(f_1\), \(0\in \operatorname {int}\text{dom }g_1\) (Remark 1.3.10), and \(g_2(0)\le -\mu \). Thus, we can apply Theorem 2.3.2 with the qualification assumption (2.3.Q.ij’).

5.2.4 Infima and suprema

Theorem 5.2.6

Let \(\{ f_j\} _{j\in J}\) be a collection of functions satisfying (1.1.1) and having a common affine minorant: \(\sup _{j\in J} f_j^*(s) {\lt} +\infty \) for some \(s\in \mathbb {R}^n\). Then their infimum \(f := \inf _{j\in J} f_j\) satisfies (1.1.1), and its conjugate is the supremum of the \(f_j^*\)’s:

\[ (\inf _{j\in J} f_j)^* = \sup _{j\in J} f_j^*. \tag {2.4.1} \]
Proof

By definition, for all \(s\in \mathbb {R}^n\)

\[ \begin{aligned} f^*(s) & = \sup _x\big[\langle s,x\rangle - \inf _{j\in J} f_j(x)\big] \\ & = \sup _x \sup _j\big[\langle s,x\rangle - f_j(x)\big] \\ & = \sup _j \sup _x\big[\langle s,x\rangle - f_j(x)\big] = \sup _{j\in J} f_j^*(s). \end{aligned} \]
Theorem 5.2.7

Let \(\{ g_j\} _{j\in J}\) be a collection of functions in \(\text{Conv }\mathbb {R}^n\). If their supremum \(g:=\sup _{j\in J} g_j\) is not identically \(+\infty \), it is in \(\text{Conv }\mathbb {R}^n\), and its conjugate is the closed convex hull of the \(g_j^*\)’s:

\[ \bigl(\sup _{j\in J} g_j\bigr)^* = \overline{\text{co }}\bigl(\inf _{j\in J} g_j^*\bigr). \tag {2.4.5} \]
Proof

Call \(f_j:=g_j^*\), hence \(f_j^*=g_j\), and \(g\) is nothing but the \(f^*\) of (2.4.1). Taking the conjugate of both sides, the result follows from (1.3.4).

5.2.5 Post-composition with an increasing convex function

Theorem 5.2.8

With \(f\) and \(g\) defined as above, assume that \(f(\mathbb R^n)\cap \operatorname {int}\operatorname {dom}g\neq \varnothing \). For all \(s\in \operatorname {dom}(g\circ f)^*\), define the function \(\psi _s\in \operatorname {Conv}\mathbb R\) by

\[ \mathbb R\ni \alpha \mapsto \psi _s(\alpha ):= \begin{cases} \alpha f^*\! \big(\tfrac {1}{\alpha }s\big)+g^*(\alpha )& \text{if }\alpha {\gt}0,\\[4pt] \sigma _{\operatorname {dom}f}(s)+g^*(0)& \text{if }\alpha =0,\\[4pt] +\infty & \text{if }\alpha {\lt}0. \end{cases} \]

Then \((g\circ f)^*(s)=\min _{\alpha \in \mathbb R}\psi _s(\alpha )\).

Proof

By definition,

\[ -(g\circ f)^*(s)=\inf _x[g(f(x))-\langle s,x\rangle ] =\inf _{x,r}\{ g(r)-\langle s,x\rangle :\; f(x)\le r\} =\inf _{x,r}[g(r)-\langle s,x\rangle +\iota _{\operatorname {epi}f}(x,r)]. \]

[ \(g\) is increasing]

We must compute the conjugate of the sum of the two functions \(f_1(x,r):=g(r)-\langle s,x\rangle \) and \(f_2:=\iota _{\operatorname {epi}f}\), at the obvious argument \((x,r)\in \mathbb R^n\times \mathbb R\). We have \(\operatorname {dom}f_1=\mathbb R^n\times \operatorname {dom}g\), so that \(\operatorname {dom}f_1=\mathbb R^n\times \operatorname {dom}g\); hence, by assumption:

\[ \operatorname {int}\operatorname {dom}f_1\cap \operatorname {dom}f_2=(\mathbb R^n\times \operatorname {int}\operatorname {dom}g)\cap \operatorname {epi}f\neq \varnothing . \]

Theorem 2.3.2, more precisely Fenchel’s duality theorem (2.3.2), can be applied with the qualification condition (2.3.Q.jj’):

\[ (g\circ f)^*(s)=\min \{ f_1^*(-p,\alpha )+f_2^*(p,-\alpha ):\; (p,\alpha )\in \mathbb R^n\times \mathbb R\} . \]

The computation of the above two conjugates is straightforward and gives

\[ (g\circ f)^*(s)=\min _{p,\alpha }\big[g^*(\alpha )+\iota _{\{ -s\} }(-p)+\sigma _{\operatorname {epi}f}(p,-\alpha )\big] =\min _{\alpha }\psi _s(\alpha ), \]

where the second equality comes from (1.2.2).

5.3 Various Examples

5.3.1 The Cramer transform

5.3.2 The conjugate of a convex partially quadratic function

Proposition 5.3.1

The function \(g\) of (3.2.1) has the conjugate

\[ g^{*}(s)=\begin{cases} \tfrac {1}{2}\langle s, (p_{H}\circ B\circ p_{H})^{-}s\rangle & \text{if } s\in \operatorname {Im}B+H^{\perp },\\[4pt] +\infty & \text{otherwise}, \end{cases}\tag {3.2.2} \]

where \(p_{H}\) is the operator of orthogonal projection onto \(H\) and \((\cdot )^{-}\) is the Moore–Penrose pseudo-inverse.

Proof

Set \(f:=\tfrac {1}{2}\langle B\cdot ,\cdot \rangle \), so that \(g=f+i_{H}\) and \(g^{*}=(f\circ p_{H})^{*}\circ p_{H}\) (Proposition 1.3.2). Knowing that \((f\circ p_{H})(x)=\tfrac {1}{2}\langle (p_{H}\circ B\circ p_{H})x,x\rangle \), we obtain from Example 1.1.4 the conjugate \(g^{*}(s)\) under the form

\[ (f\circ p_{H})^{*}(p_{H}s)=\begin{cases} \tfrac {1}{2}\langle s,(p_{H}\circ B\circ p_{H})^{-}s\rangle & \text{if } p_{H}s\in \operatorname {Im}(p_{H}\circ B\circ p_{H}),\\[4pt] +\infty & \text{otherwise}. \end{cases} \]

It could be checked directly that \(\operatorname {Im}(p_{H}\circ B\circ p_{H})+H^{\perp }=\operatorname {Im}B+H^{\perp }\). A simpler argument, however, is obtained via Theorem 2.3.2, which can be applied since \(\operatorname {dom}f=\mathbb {R}^{n}\). Thus,

\[ g^{*}(s)=(f^{*}\, \square \, i_{H^{\perp }})(s)=\min \{ \tfrac {1}{2}\langle p,B^{-}p\rangle :\; p\in \operatorname {Im}B,\; s-p\in H^{\perp }\} , \]

which shows that \(\operatorname {dom}g^{*}=\operatorname {Im}B+H^{\perp }\).

5.3.3 Polyhedral functions

Proposition 5.3.2

At each \(s\in \operatorname {co}\{ s_1,\ldots ,s_k\} =\operatorname {dom}f^*\), the conjugate of \(f\) has the value ( \(\Delta _k\) is the unit simplex)

\[ f^*(s)=\min \Big\{ \sum _{i=1}^k \alpha _i b_i:\ \alpha \in \Delta _k,\ \sum _{i=1}^k \alpha _i s_i=s\Big\} . \tag {3.3.2} \]
Proof

Set \(g_i(s):= \iota _{\{ s_i\} }+b_i\) and

\[ g(s):=(\inf _i g_i)(s)= \begin{cases} b_i & \text{if } s=s_i\text{ for some } i=1,\ldots ,k,\\[4pt] +\infty & \text{otherwise}. \end{cases} \]

Apply Proposition B.2.5.4 to see that \(\overline{\operatorname {co}}\, g=f^*\) of (3.3.2). The rest follows from Theorem 2.4.1 or 2.4.4, with a notational flip of \(f\) and \(g\).

5.4 Differentiability of a Conjugate Function

5.4.1 First-order differentiability

Theorem 5.4.1

Let \(f\in \operatorname {Conv}\mathbb {R}^n\) be strictly convex. Then \(\operatorname {int}\text{dom }f^*\neq \varnothing \) and \(f^*\) is continuously differentiable on \(\operatorname {int}\text{dom }f^*\).

Proof

For arbitrary \(x_0\in \text{dom }f\) and nonzero \(d\in \mathbb {R}^n\), consider Example 2.4.6. Strict convexity of \(f\) implies that

\[ 0{\lt}\frac{f(x_0 - td)-f(x_0)}{t}+\frac{f(x_0+td)-f(x_0)}{t}\qquad \text{for all }t{\gt}0, \]

and this inequality extends to the suprema: \(0{\lt}f'_\infty (-d)+f'_\infty (d)\). Remembering that \(f'_\infty =\sigma _{\text{dom }f^*}\) (Proposition 1.2.2), this means

\[ \sigma _{\text{dom }f^*}(d)+\sigma _{\text{dom }f^*}(-d){\gt}0, \]

i.e. \(\text{dom }f^*\) has a positive breadth in every nonzero direction \(d\): its interior is nonempty—Theorem C2.2.3(iii).

Now suppose that there is some \(s\in \int \text{dom }f^*\) such that \(\partial f^*(s)\) contains two distinct points \(x_1\) and \(x_2\). Then \(s\in \partial f(x_1)\cap \partial f(x_2)\); by convex combination of the relations

\[ f^*(s)+f(x_i)=\langle s,x_i\rangle \qquad \text{for }i=1,2 \]

we deduce, using Fenchel’s inequality (1.1.3):

\[ f^*(s)+\sum _{i=1}^2\alpha _i f(x_i)=\langle s,\sum _{i=1}^2\alpha _i x_i\rangle \le f^*(s)+f\Big(\sum _{i=1}^2\alpha _i x_i\Big), \]

which implies that \(f\) is affine on \([x_1,x_2]\), a contradiction. In other words, \(\partial f^*\) is single-valued on \(\int \text{dom }f^*\), and this means that \(f^*\) is continuously differentiable there (Remark 6.2.6).

Theorem 5.4.2

Let \(f\in \operatorname {Conv}\mathbb {R}^n\) be differentiable on the set \(\Omega :=\operatorname {int}\operatorname {dom}f\). Then \(f^*\) is strictly convex on each convex subset \(C\subset \nabla f(\Omega )\).

Proof

Let \(C\) be a convex set as stated. Suppose that there are two distinct points \(s_1\) and \(s_2\) in \(C\) such that \(f^*\) is affine on the line-segment \([s_1,s_2]\). Then, setting \(s:=\tfrac 12(s_1+s_2)\in C\subset \nabla f(\Omega )\), there is \(x\in \Omega \) such that \(\nabla f(x)=s\), i.e. \(x\in \partial f^*(s)\). Using the affine character of \(f^*\), we have

\[ 0=f(x)+f^*(s)-\langle s,x\rangle =\tfrac 12\sum _{i=1}^2\bigl[f(x)+f^*(s_i)-\langle s_i,x\rangle \bigr] \]

and, in view of Fenchel’s inequality (1.1.3), this implies that each term in the bracket is \(0\): \(x\in \partial f^*(s_1)\cap \partial f^*(s_2)\), i.e. \(\partial f(x)\) contains the two points \(s_1\) and \(s_2\), a contradiction to the existence of \(\nabla f(x)\).

Corollary 5.4.1
#

Let \(f:\mathbb {R}^n\to \mathbb {R}\) be strictly convex, differentiable and 1-coercive. Then

  1. \(f^{*}\) is also finite-valued on \(\mathbb {R}^n\), strictly convex, differentiable and 1-coercive;

  2. the continuous mapping \(\nabla f\) is one-to-one from \(\mathbb {R}^n\) onto \(\mathbb {R}^n\), and its inverse is continuous;

\[ \text{(iii) }\; f^*(s)=\langle s,(\nabla f)^{-1}(s)\rangle - f\bigl((\nabla f)^{-1}(s)\bigr)\quad \text{for all }s\in \mathbb {R}^n. \]

5.4.2 Lipschitz continuity of the gradient mapping

Theorem 5.4.3

Assume that \(f:\mathbb {R}^n\to \mathbb {R}\) is strongly convex with modulus \(c{\gt}0\) on \(\mathbb {R}^n\): for all \((x_1,x_2)\in \mathbb {R}^n\times \mathbb {R}^n\) and \(\alpha \in ]0,1[\),

\[ f(\alpha x_1+(1-\alpha )x_2)\le \alpha f(x_1)+(1-\alpha )f(x_2)-\tfrac {1}{2}c\alpha (1-\alpha )\| x_1-x_2\| ^2. \tag {4.2.1} \]

Then \(\operatorname {dom} f^*=\mathbb {R}^n\) and \(\nabla f^*\) is Lipschitzian with constant \(1/c\) on \(\mathbb {R}^n\):

\[ \| \nabla f^*(s_1)-\nabla f^*(s_2)\| \le \tfrac {1}{c}\| s_1-s_2\| \quad \text{for all }(s_1,s_2)\in \mathbb {R}^n\times \mathbb {R}^n. \]
Proof

We use the various equivalent definitions of strong convexity (see Theorem D.6.1.2). Fix \(x_0\) and \(s_0\in \partial f(x_0)\): for all \(0\neq d\in \mathbb {R}^n\) and \(t\ge 0\)

\[ f(x_0+td)\ge f(x_0)+t\langle s_0,d\rangle +\tfrac {1}{2}ct^2\| d\| ^2, \]

hence \(f'_+(d)=\sigma _{\operatorname {dom} f^*}(d)=+\infty \), i.e. \(\operatorname {dom} f^*=\mathbb {R}^n\). Also, \(f\) is in particular strictly convex, so we know from Theorem 4.1.1 that \(f^*\) is differentiable (on \(\mathbb {R}^n\)). Finally, strong convexity of \(f\) can also be written \((s_1-s_2,x_1-x_2)\ge c\| x_1-x_2\| ^2\), in which we have \(s_i\in \partial f(x_i)\), i.e. \(s_i=\nabla f^*(s_i)\), for \(i=1,2\). The rest follows from the Cauchy–Schwarz inequality.

Theorem 5.4.4

Let \(f:\mathbb {R}^n\to \mathbb {R}\) be convex and have a gradient-mapping Lipschitzian with constant \(L{\gt}0\) on \(\mathbb {R}^n\): for all \((x_1,x_2)\in \mathbb {R}^n\times \mathbb {R}^n\),

\[ \| \nabla f(x_1)-\nabla f(x_2)\| \le L\| x_1-x_2\| . \]

Then \(f^*\) is strongly convex with modulus \(1/L\) on each convex subset \(C\subset \operatorname {dom}\partial f^*\). In particular, there holds for all \((x_1,x_2)\in \mathbb {R}^n\times \mathbb {R}^n\)

\[ \langle \nabla f(x_1)-\nabla f(x_2),\, x_1-x_2\rangle \ge \tfrac {1}{L}\| \nabla f(x_1)-\nabla f(x_2)\| ^2. \tag {4.2.2} \]
Proof

Let \(s_1\) and \(s_2\) be arbitrary in \(\text{dom }\partial f^* \subset \text{dom }f^*\); take \(s\) and \(s'\) on the segment \([s_1,s_2]\). To establish the strong convexity of \(f^*\), we need to minorize the remainder term \(f^*(s')-f^*(s)-\langle x,s'-s\rangle \), with \(x\in \partial f^*(s)\). For this, we minorize \(f^*(s')=\sup _y\{ \langle s',y\rangle -f(y)\} \), i.e. we majorize \(f(y)\):

\[ \begin{aligned} f(y)& =f(x)+\langle \nabla f(x),\, y-x\rangle +\int _0^1\langle \nabla f(x+t(y-x))-\nabla f(x),\, y-x\rangle dt\\ & \le f(x)+\langle \nabla f(x),\, y-x\rangle +\tfrac 12 L\| y-x\| ^2\\ & = -f^*(s)+\langle s,y\rangle +\tfrac 12 L\| y-x\| ^2 \end{aligned} \]

(we have used the property \(\int _0^1 t\, dt=1/2\), as well as \(x\in \partial f^*(s)\), i.e. \(\nabla f(x)=s\)). In summary, we have

\[ f^*(s')\ge f^*(s)+\sup _y\big[\, \langle s'-s,y\rangle -\tfrac 12 L\| y-x\| ^2\big]. \]

Observe that the last supremum is nothing but the value at \(s'-s\) of the conjugate of \(\tfrac 12 L\| \, \cdot \, \| ^2\). Using the calculus rule 1.3.1, we have therefore proved

\begin{equation} \label{thm:FCA-chapE-4.2.3} f^*(s')\ge f^*(s)+\langle s'-s,x\rangle +\tfrac 1{2L}\| s'-s\| ^2 \end{equation}
14

for all \(s,s'\) in \([s_1,s_2]\) and all \(x\in \partial f^*(s)\). Replacing \(s'\) in (4.2.3) by \(s_1\) and by \(s_2\), and setting \(s=\alpha s_1+(1-\alpha )s_2\), the strong convexity (4.2.1) for \(f^*\) is established by convex combination.

On the other hand, replacing \((s,s')\) by \((s_1,s_2)\) in (4.2.3):

\[ f^*(s_2)\ge f^*(s_1)+\langle s_2-s_1,x_1\rangle +\tfrac 1{2L}\| s_2-s_1\| ^2\quad \text{for all }x_1\in \partial f^*(s_1). \]

Then, replacing \((s,s')\) by \((s_2,s_1)\) and summing: \(\langle x_1-x_2,s_1-s_2\rangle \ge \tfrac 1L\| s_1-s_2\| ^2\). In view of the differentiability of \(f\), this is just (4.2.2), which has to hold for all \((x_1,x_2)\) simply because \(\Im \partial f^*=\text{dom }\nabla f=\mathbb R^n\).