In preparation for the next post on entropy optimality, we need a little background on lifts of polytopes and non-negative matrix factorization.

** 1.1. Polytopes and inequalities **

A *-dimensional convex polytope* is the convex hull of a finite set of points in . Equivalently, it is a compact set defined by a family of linear inequalities

for some matrix .

Let us give a measure of complexity for : Define to be the smallest number such that for some , we have

In other words, this is the minimum number of *inequalities* needed to describe . If is full-dimensional, then this is precisely the number of *facets* of (a facet is a maximal proper face of ).

Thinking of as a measure of complexity makes sense from the point of view of optimization: Interior point methods can efficiently optimize linear functions over (to arbitrary accuracy) in time that is polynomial in .

** 1.2. Lifts of polytopes **

Many simple polytopes require a large number of inequalities to describe. For instance, the *cross-polytope*

has . On the other hand, is the *projection* of the polytope

onto the coordinates, and manifestly, . Thus is the (linear) shadow of a much simpler polytope in a higher dimension.

[image credit: Fiorini, Rothvoss, and Tiwary]

A polytope is called a *lift* of the polytope if is the image of under a linear projection. Again, from an optimization stand point, lifts are important: If we can optimize linear functionals over , then we can optimize linear functionals over . Define now to be the minimal value of over all lifts of . (The value is sometimes called the *(linear) extension complexity* of .)

** 1.3. The permutahedron **

Here is a somewhat more interesting family of examples where lifts reduce complexity. The *permutahedron* is the convex hull of the vectors where . It is known that .

Let denote the convex hull of the permutation matrices. The Birkhoff-von Neumann theorem tells us that is precisely the set of doubly stochastic matrices, thus (corresponding to the non-negativity constraints on each entry).

Observe that is the linear image of under the map , i.e. we multiply a matrix on the right by the column vector . Thus is a lift of , and we conclude that .

** 1.4. The cut polytope **

If , there are certain combinatorial polytopes we should not be able to optimize over efficiently. A central example is the *cut polytope:* is the convex hull of all matrices of the form for some subset . Here, denotes the characteristic function of .

Note that the MAX-CUT problem on a graph can be encoded in the following way: Let if and otherwise. Then the value of the maximum cut in is precisely the maximum of for . Accordingly, we should expect that cannot be bounded by any polynomial in (lest we violate a basic tenet of complexity theory).

** 1.5. Non-negative matrix factorization **

The key to understanding comes from Yannakakis’ factorization theorem.

Consider a polytope and let us write in two ways: As a convex hull of vertices

and as an intersection of half-spaces: For some and ,

Given this pair of representations, we can define the corresponding *slack matrix* of by

Here, denote the rows of .

We need one more definition. In what follows, we will use . If we have a non-negative matrix , then a *rank- non-negative factorization of * is a factorization where and . We then define the *non-negative rank of ,* written , to be the smallest such that admits a rank- non-negative factorization.

Theorem 1 (Yannakakis)For every polytope , it holds that for any slack matrix of .

The key fact underlying this theorem is Farkas’ Lemma. It asserts that if , then every valid linear inequality over can be written as a non-negative combination of the defining inequalities .

There is an interesting connection here to proof systems. The theorem says that we can interpret as the minimum number of axioms so that every valid linear inequality for can be proved using a conic (i.e., non-negative) combination of the axioms.

** 1.6. Slack matrices and the correlation polytope **

Thus to prove a lower bound on , it suffices to find a valid set of linear inequalities for and prove a lower bound on the non-negative rank of the corresponding slack matrix.

Toward this end, consider the correlation polytope given by

It is an exercise to see that and are linearly isomorphic.

Now we identify a particularly interesting family of valid linear inequalities for . (In fact, it turns out that this will also be an exhaustive list.) A *quadratic multi-linear function* on is a function of the form

for some real numbers and .

Suppose is a quadratic multi-linear function that is also non-negative on . Then “” can be encoded as a valid linear inequality on . To see this, write where . (Note that is intended to be the standard inner product on .)

Now let be the set of all quadratic multi-linear functions that are non-negative on , and consider the matrix (represented here as a function) given by

Then from the above discussion, is a valid sub-matrix of some slack matrix of . To summarize, we have the following theorem.

Theorem 2For all , it holds that .

It is actually the case that . The next post will focus on providing a lower bound on .