# Primer: Lifts of polytopes and non-negative matrix factorization

In preparation for the next post on entropy optimality, we need a little background on lifts of polytopes and non-negative matrix factorization.

1.1. Polytopes and inequalities

A ${d}$-dimensional convex polytope ${P \subseteq \mathbb R^d}$ is the convex hull of a finite set of points in ${\mathbb R^d}$. Equivalently, it is a compact set defined by a family of linear inequalities

$\displaystyle P = \{ x \in \mathbb R^d : Ax \leq b \}$

for some matrix ${A \in \mathbb R^{m \times d}}$.

Let us give a measure of complexity for ${P}$: Define ${\gamma(P)}$ to be the smallest number ${m}$ such that for some ${C \in \mathbb R^{s \times d}, y \in \mathbb R^s, A \in \mathbb R^{m \times d}, b \in \mathbb R^m}$, we have

$\displaystyle P = \{ x \in \mathbb R^d : Cx=y \textrm{ and } Ax \leq b\}\,.$

In other words, this is the minimum number of inequalities needed to describe ${P}$. If ${P}$ is full-dimensional, then this is precisely the number of facets of ${P}$ (a facet is a maximal proper face of ${P}$).

Thinking of ${\gamma(P)}$ as a measure of complexity makes sense from the point of view of optimization: Interior point methods can efficiently optimize linear functions over ${P}$ (to arbitrary accuracy) in time that is polynomial in ${\gamma(P)}$.

1.2. Lifts of polytopes

Many simple polytopes require a large number of inequalities to describe. For instance, the cross-polytope

$\displaystyle C_d = \{ x \in \mathbb R^d : \|x\|_1 \leq 1 \} = \{ x \in \mathbb R^d : \pm x_1 \pm x_2 \cdots \pm x_d \leq 1 \}$

has ${\gamma(C_d)=2^d}$. On the other hand, ${C_d}$ is the projection of the polytope

$\displaystyle Q_d = \left\{ (x,y) \in \mathbb R^{2d} : \sum_{i=1}^n y_i = 1, \,\, - y_i \leq x_i \leq y_i\,\forall i\right\}$

onto the ${x}$ coordinates, and manifestly, ${\gamma(Q_d) \leq 2d}$. Thus ${C_d}$ is the (linear) shadow of a much simpler polytope in a higher dimension.

[image credit: Fiorini, Rothvoss, and Tiwary]

A polytope ${Q}$ is called a lift of the polytope ${P}$ if ${P}$ is the image of ${Q}$ under a linear projection. Again, from an optimization stand point, lifts are important: If we can optimize linear functionals over ${Q}$, then we can optimize linear functionals over ${P}$. Define now ${\bar \gamma(P)}$ to be the minimal value of ${\gamma(Q)}$ over all lifts ${Q}$ of ${P}$. (The value ${\bar \gamma(P)}$ is sometimes called the (linear) extension complexity of ${P}$.)

1.3. The permutahedron

Here is a somewhat more interesting family of examples where lifts reduce complexity. The permutahedron ${\Pi_n \subseteq \mathbb R^n}$ is the convex hull of the vectors ${(i_1, i_2, \ldots, i_n)}$ where ${\{i_1, \ldots, i_n\} = \{1,\ldots,n \}}$. It is known that ${\gamma(\Pi_n)=2^n-2}$.

Let ${B_n \subseteq \mathbb R^{n^2}}$ denote the convex hull of the ${n \times n}$ permutation matrices. The Birkhoff-von Neumann theorem tells us that ${B_n}$ is precisely the set of doubly stochastic matrices, thus ${\gamma(B_n)=n^2}$ (corresponding to the non-negativity constraints on each entry).

Observe that ${\Pi_n}$ is the linear image of ${B_n}$ under the map ${A \mapsto A (1, 2, \ldots, n)^T}$, i.e. we multiply a matrix ${A \in B_n}$ on the right by the column vector ${(1, 2, \ldots, n)}$. Thus ${B_n}$ is a lift of ${\Pi_n}$, and we conclude that ${\bar \gamma(\Pi_n) \leq n^2 \ll \gamma(\Pi_n)}$.

1.4. The cut polytope

If ${P \neq NP}$, there are certain combinatorial polytopes we should not be able to optimize over efficiently. A central example is the cut polytope: ${\mathrm{CUT}_n \subseteq \mathbb R^{n^2}}$ is the convex hull of all matrices of the form ${(A_S)_{ij} = |\mathbf{1}_S(i)-\mathbf{1}_S(j)|}$ for some subset ${S \subseteq \{1,\ldots,n\}}$. Here, ${\mathbf{1}_S}$ denotes the characteristic function of ${S}$.

Note that the MAX-CUT problem on a graph ${G=(V,E)}$ can be encoded in the following way: Let ${W_{ij} = 1}$ if ${\{i,j\} \in E}$ and ${W_{ij}=0}$ otherwise. Then the value of the maximum cut in ${G}$ is precisely the maximum of ${\langle W, A\rangle}$ for ${A \in \mathrm{CUT}_n}$. Accordingly, we should expect that ${\bar \gamma(\mathrm{CUT}_n)}$ cannot be bounded by any polynomial in ${n}$ (lest we violate a basic tenet of complexity theory).

1.5. Non-negative matrix factorization

The key to understanding ${\bar \gamma(\mathrm{CUT}_n)}$ comes from Yannakakis’ factorization theorem.

Consider a polytope ${P \subseteq \mathbb R^d}$ and let us write in two ways: As a convex hull of vertices

$\displaystyle P = \mathrm{conv}\left(x_1, x_2, \ldots, x_n\right)\,,$

and as an intersection of half-spaces: For some ${A \in \mathbb R^{m \times d}}$ and ${b \in \mathbb R^m}$,

$\displaystyle P = \left\{ x \in \mathbb R^d : Ax \leq b \right\}\,.$

Given this pair of representations, we can define the corresponding slack matrix of ${P}$ by

$\displaystyle S_{ij} = b_i - \langle A_i, x_j \rangle \qquad i \in \{1,2,\ldots,m\}, j \in \{1,2,\ldots,n\}\,.$

Here, ${A_1, \ldots, A_m}$ denote the rows of ${A}$.

We need one more definition. In what follows, we will use ${\mathbb R_+ = [0,\infty)}$. If we have a non-negative matrix ${M \in \mathbb R_+^{m \times n}}$, then a rank-${r}$ non-negative factorization of ${M}$ is a factorization ${M = AB}$ where ${A \in \mathbb R_+^{m \times r}}$ and ${B \in \mathbb R_+^{r \times n}}$. We then define the non-negative rank of ${M}$, written ${\mathrm{rank}_+(M)}$, to be the smallest ${r}$ such that ${M}$ admits a rank-${r}$ non-negative factorization.

Theorem 1 (Yannakakis) For every polytope ${P}$, it holds that ${\bar \gamma(P) = \mathrm{rank}_+(S)}$ for any slack matrix ${S}$ of ${P}$.

The key fact underlying this theorem is Farkas’ Lemma. It asserts that if ${P = \{ x \in \mathbb R^d : Ax \leq b \}}$, then every valid linear inequality over ${P}$ can be written as a non-negative combination of the defining inequalities ${\langle A_i, x \rangle \leq b_i}$.

There is an interesting connection here to proof systems. The theorem says that we can interpret ${\bar \gamma(P)}$ as the minimum number of axioms so that every valid linear inequality for ${P}$ can be proved using a conic (i.e., non-negative) combination of the axioms.

1.6. Slack matrices and the correlation polytope

Thus to prove a lower bound on ${\bar \gamma(\mathrm{CUT}_n)}$, it suffices to find a valid set of linear inequalities for ${\mathrm{CUT}_n}$ and prove a lower bound on the non-negative rank of the corresponding slack matrix.

Toward this end, consider the correlation polytope ${\mathrm{CORR}_n \subseteq \mathbb R^{n^2}}$ given by

$\displaystyle \mathrm{CORR}_n = \mathrm{conv}\left(\left\{x x^T : x \in \{0,1\}^n \right\}\right)\,.$

It is an exercise to see that ${\mathrm{CUT}_{n+1}}$ and ${\mathrm{CORR}_n}$ are linearly isomorphic.

Now we identify a particularly interesting family of valid linear inequalities for ${\mathrm{CORR}_n}$. (In fact, it turns out that this will also be an exhaustive list.) A quadratic multi-linear function on ${\mathbb R^n}$ is a function ${f : \mathbb R^n \rightarrow \mathbb R}$ of the form

$\displaystyle f(x) = a_0 + \sum_i a_{ii} x_i + \sum_{i < j} a_{ij} x_i x_j\,,$

for some real numbers ${a_0}$ and ${\{a_{ij}\}}$.

Suppose ${f}$ is a quadratic multi-linear function that is also non-negative on ${\{0,1\}^n}$. Then “${f(x) \geq 0 \,\,\forall x \in \{0,1\}^n}$” can be encoded as a valid linear inequality on ${\mathrm{CORR}_n}$. To see this, write ${f(x)=\langle A,xx^T\rangle + a_0}$ where ${A=(a_{ij})}$. (Note that ${\langle \cdot,\cdot\rangle}$ is intended to be the standard inner product on ${\mathbb R^{n^2}}$.)

Now let ${\textrm{QML}^+_n}$ be the set of all quadratic multi-linear functions that are non-negative on ${\{0,1\}^n}$, and consider the matrix (represented here as a function) ${\mathcal M_n : \textrm{QML}^+_n \times \{0,1\}^n \rightarrow \mathbb R_+}$ given by

$\displaystyle \mathcal M_n(f,x) = f(x)\,.$

Then from the above discussion, ${\mathcal M_n}$ is a valid sub-matrix of some slack matrix of ${\mathrm{CORR}_n}$. To summarize, we have the following theorem.

Theorem 2 For all ${n \geq 1}$, it holds that ${\bar \gamma(\mathrm{CUT}_{n+1}) \geq \mathrm{rank}_+(\mathcal M_n)}$.

It is actually the case that ${\bar \gamma(\mathrm{CUT}_{n+1}) = \mathrm{rank}_+(\mathcal M_n)}$. The next post will focus on providing a lower bound on ${\mathrm{rank}_+(\mathcal M_n)}$.