Entropy optimality: Lifts of polytopes

Recall from the previous post that {\mathrm{CUT}_n} denotes the cut polytope on the {n} -vertex complete graph, and {\bar \gamma(\mathrm{CUT}_n)} is the smallest number of facets in any polytope that linearly projects to {\mathrm{CUT}_n} . Our goal is to prove a lower bound on this quantity, but first we should mention that a nearly tight lower bound is known.

Theorem 1 (Fiorini, Massar, Pokutta, Tiwari, de Wolf 2012) There is a constant {c > 0} such that for every {n \geq 1} , {\bar \gamma(\mathrm{CUT}_n) \geq e^{cn}} .

We will present a somewhat weaker lower bound using entropy maximization, following our joint works with Chan, Raghavendra, and Steurer (2013) and with Raghavendra and Steurer (2015). This method is only currently capable of proving that {\bar \gamma(\mathrm{CUT}_n) \geq e^{cn^{1/3}}} , but it has the advantage of being generalizable—it extends well to the setting of approximate lifts and spectrahedral lifts (we’ll come to the latter in a few posts).

1.1. The entropy maximization framework

To use entropy optimality, we proceed as follows. For the sake of contradiction, we assume that {\bar \gamma(\mathrm{CUT}_n)} is small.

First, we will identify the space of potential lifts of small {\gamma} -value with the elements of a convex set of probability measures. (This is where the connection to non-negative matrix factorization (NMF) will come into play.) Then we will choose a family of “tests” intended to capture the difficult aspects of being a valid lift of {\mathrm{CUT}_n} . This step is important as having more freedom (corresponding to weaker tests) will allow the entropy maximization to do more “simplification.” The idea is that the family of tests should still be sufficiently powerful to prove a lower bound on the entropy-optimal hypothesis.

Finally, we will maximize the entropy of our lift over all elements of our convex set, subject to performing well on the tests. Our hope is that the resulting lift is simple enough that we can prove it couldn’t possibly pass all the tests, leading to a contradiction.

In order to find the right set of tests, we will identify a family of canonical (approximate) lifts. This is family {\{P_k\}} of polytopes so that {\gamma(P_k) \leq O(n^k)} and which we expect to give the “best approximation” to {\mathrm{CUT}_n} among all lifts with similar {\gamma} -value. We can identify this family precisely because these will be lifts that obey the natural symmetries of the cut polytope (observe that the symmetric group {S_n} acts on {\mathrm{CUT}_n} in the natural way).

1.2. NMF and positivity certificates

Recall the matrix {\mathcal M_n : \mathrm{QML}^+_n \times \{0,1\}^n \rightarrow \mathbb R_+} given by {\mathcal M_n(f,x)=f(x)} , where {\mathrm{QML}^+_n} is the set of all quadratic multi-linear functions that are non-negative on {\{0,1\}^n} . In the previous post, we argued that {\bar \gamma(\mathrm{CUT}_{n+1}) \geq \mathrm{rank}_+(\mathcal M_n)} .

If {r=\mathrm{rank}_+(\mathcal M_n)} , it means we can write

\displaystyle  f(x) = \mathcal M_n(f,x) = \sum_{i=1}^r A_i(f) B_i(x) \ \ \ \ \ (1)

for some functions {A_i : \mathrm{QML}^+_n \rightarrow \mathbb R_+} and {B_i : \{0,1\}^n \rightarrow \mathbb R_+} . (Here we are using a factorization {\mathcal M_n = AB} where {A_{f,i} = A_i(f)} and {B_{x,i}=B_i(x)} .)

Thus the low-rank factorization gives us a “proof system” for {\mathrm{QML}^+_n} . Every {f \in \mathrm{QML}^+_n} can be written as a conic combination of the functions {B_1, B_2, \ldots, B_r} , thereby certifying its positivity (since the {B_i} ‘s are positive functions).

If we think about natural families {\mathcal B = \{B_i\}} of “axioms,” then since {\mathrm{QML}^+_n} is invariant under the natural action of {S_n} , we might expect that our family {\mathcal B} should share this invariance. Once we entertain this expectation, there are natural small families of axioms to consider: The space of non-negative {k} -juntas for some {k \ll n} .

A {k} -junta {b : \{0,1\}^n \rightarrow \mathbb R} is a function whose value only depends on {k} of its input coordinates. For a subset {S \subseteq \{1,\ldots,n\}} with {|S|=k} and an element {z \in \{0,1\}^k} , let {q_{S,z} : \{0,1\}^n \rightarrow \{0,1\}} denote the function given by {q_{S,z}(x)=1} if and only if {x|_S=z} .

We let {\mathcal J_k = \{ q_{S,z} : |S| \leq k, z \in \{0,1\}^{|S|} \}} . Observe that {|\mathcal J_k| \leq O(n^k)} . Let us also define {\mathrm{cone}(\mathcal J_k)} as the set of all conic combinations of functions in {\mathcal J_k} . It is easy to see that {\mathrm{cone}(\mathcal J_k)} contains precisely the conic combinations of non-negative {k} -juntas.

If it were true that {\mathrm{QML}^+_n \subseteq \mathcal J_k} for some {k} , we could immediately conclude that {\mathrm{rank}_+(\mathcal M_n) \leq |\mathcal J_k| \leq O(n^k)} by writing {\mathcal M_n} in the form (1) where now {\{B_i\}} ranges over the elements of {\mathcal J_k} and {\{A_i(f)\}} gives the corresponding non-negative coefficients that follow from {f \in \mathcal J_k} .

1.3. No {k} -junta factorization for {k \leq n/2}

We will now see that juntas cannot provide a small set of axioms for {\mathrm{QML}^+_n} .

Theorem 2 Consider the function {f : \{0,1\}^n \rightarrow \mathbb R_+} given by {f(x) = \left(1-\sum_{i=1}^n x_i\right)^2} . Then {f \notin \mathcal J_{\lceil n/2\rceil}} .

Toward the proof, let’s introduce a few definitions. First, for {f : \{0,1\}^n \rightarrow \mathbb R_+} , define the junta degree of {f} to be

\displaystyle  \deg_J(f) = \min \{ k : f \in \mathrm{cone}(\mathcal J_k) \}\,.

Since every {f} is an {n} -junta, we have {\deg_J(f) \leq n} .

Now because {\{ f : \deg_J(f) \leq k \}} is a cone, there is a universal way of proving that {\deg_J(f) > k} . Say that a functional {\varphi : \{0,1\}^n \rightarrow \mathbb R} is {k} -locally positive if for all {|S| \leq k} and {z \in \{0,1\}^{|S|}} , we have

\displaystyle \sum_{x \in \{0,1\}^n} \varphi(x) q_{S,z}(x) \geq 0\,.

These are precisely the linear functionals separating a function {f} from {\mathrm{cone}(\mathcal J_k)} : We have {\mathrm{deg}_J(f) > k} if and only if there is a {k} -locally positive functional {\varphi} such that {\sum_{x \in \{0,1\}^n} \varphi(x) f(x) < 0} . Now we are ready to prove Theorem 2.

Proof: We will prove this using an appropriate {k} -locally positive functional. Define

\displaystyle  \varphi(x) = \begin{cases} -1 & |x| = 0 \\ \frac{2}{n} & |x|=1 \\ 0 & |x| > 1\,, \end{cases}

where {|x|} denotes the hamming weight of {x \in \{0,1\}^n} .

First, observe that

\displaystyle \sum_{x \in \{0,1\}^n} \varphi(x) = -1 + n \cdot \frac{2}{n} = 1\,.

Now recall the the function {f} from the statement of the theorem and observe that by opening up the square, we have

\displaystyle  \sum_{x \in \{0,1\}^n} \varphi(x) f(x) = \sum_{x \in \{0,1\}^n} \varphi(x) \left(1-2 \sum_i x_i + \sum_i x_i^2 + 2\sum_{i \neq j} x_i x_j\right)\ \ \ \ \ (2)

\displaystyle  = \sum_{x \in \{0,1\}^n} \varphi(x) \left(1-\sum_i x_i\right) = -1\,.

Finally, consider some {S \subseteq \{1,\ldots,n\}} with {|S|=k \leq n/2} and {z \in \{0,1\}^k} . If {z=\mathbf{0}} , then

\displaystyle  \sum_{x \in \{0,1\}^n} \varphi(x) q_{S,z}(x) = -1 + \frac{2}{n} \left(n-k\right) \geq 0\,.

If {|z| > 1} , then the sum is 0. If {|z|=1} , then the sum is non-negative because in that case {q_{S,z}} is only supported on non-negative values of {\varphi} . We conclude that {\varphi} is {k} -locally positive for {k \leq n/2} . Combined with (2), this yields the statement of the theorem. \Box

1.4. From juntas to general factorizations

So far we have seen that we cannot achieve a low non-negative rank factorization of {\mathcal M_n} using {k} -juntas for {k \leq n/2} .

Brief aside: If one translates this back into the setting of lifts, it says that the {k} -round Sherali-Adams lift of the polytope

\displaystyle P = \left\{ x \in [0,1]^{n^2} : x_{ij}=x_{ji},\, x_{ij} \leq x_{jk} + x_{ki} \quad \forall i,j,k \in \{1,\ldots,n\}\right\}

does not capture {\mathrm{CUT}_n} for {k \leq n/2} .

In the next post, we will use entropy maximization to show that a non-negative factorization of {\mathcal M_n} would lead to a {k} -junta factorization with {k} small (which we just saw is impossible).

For now, let us state the theorem we will prove. We first define a submatrix of {\mathcal M_n} . Fix some integer {m \geq 1} and a function {g : \{0,1\}^m \rightarrow \mathbb R_+} . Now define the matrix {M_n^g : {[n] \choose m} \times \{0,1\}^n \rightarrow \mathbb R_+} given by

\displaystyle  M_n^g(S,x) = g(x|_S)\,.

The matrix is indexed by subsets {S \subseteq [n]} with {|S|=m} and elements {x \in \{0,1\}^n} . Here, {x|_S} represents the (ordered) restriction of {x} to the coordinates in {S} .

Theorem 3 (Chan-L-Raghavendra-Steurer 2013) For every {m \geq 1} and {g : \{0,1\}^m \rightarrow \mathbb R_+} , there is a constant {C=C(g)} such that for all {n \geq 2m} ,

\displaystyle  \mathrm{rank}_+(M_n^g) \geq C \left(\frac{n}{\log n}\right)^{\mathrm{deg}_J(g)-1}\,.

Note that if {g \in \mathrm{QML}^+_m} then {M_n^g} is a submatrix of {\mathcal M_n} . Since Theorem 2 furnishes a sequence of quadratic multi-linear functions {\{g_j\}} with {\mathrm{deg}_J(g_j) \rightarrow \infty} , the preceding theorem tells us that {\mathrm{rank}_+(\mathcal M_n)} cannot be bounded by any polynomial in {n} . A more technical version of the theorem is able to achieve a lower bound of {e^{c n^{1/3}}} (see Section 7 here).

4 thoughts on “Entropy optimality: Lifts of polytopes

  1. Thomas- I can’t tell if you’re being sarcastic. But I will take your comment at face value. Actually, in trying to write the next part in the correct way (and looking ahead to quantum entropy), I ran into some things I don’t understand (around factorization and tensor norms) and made a note to talk to you. But I think I can finish the next post before that.

  2. I’m certainly not being sarcastic – in fact I found it surprising no-one had explicitly expressed support so far and wanted to jump in. I’ve been immensely enjoying your posts, and just wanted to express how much I’m waiting for the next one every day :-) They’ve been extremely clear and helpful. And, having attempted something similar in the past I do realize the amount of work must go into them.

  3. Thanks :) I’m enjoying writing them, except possibly for the most recent (the definition of \alpha_+ is not great—but you can see the connection to factorization norms). I’m glad you mentioned your blog since I’ve been wondering about quantum hypercontractivity. Going to read now. After the polytope stuff, the next set of posts here is about (classical) hypercontractivity using–what else–entropy optimization.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s