Lecture P1. From Integrality Gaps to Dictatorship Tests

Here are Prasad Raghavendra‘s notes on one of two guest lectures he gave for CSE 599S. Prasad will be a faculty member at Georgia Tech after he finishes his postdoc at MSR New England.

In yet another excursion in to applications of discrete harmonic analysis, we will now see an application of the invariance principle, to hardness of approximation. We will complement this discussion with the proof of the invariance principle in the next lecture.

1. Invariance Principle

In its simplest form, the central limit theorem asserts the following: “As {n} increases, the sum of {n} independent bernoulli random variables ({\pm 1} random variables) has approximately the same distribution as the sum of {n} independent normal (Gaussian with mean {0} and variance {1}) random variables”

Alternatively, as {n} increases, the value of the polynomial {F(\mathbf x) = 	\frac{1}{\sqrt{n}} (x^{(1)} + x^{(2)} + 	\ldots x^{(n)})} has approximately the same distribution whether the random variables {x^{(i)}} are i.i.d Bernoulli random variables or i.i.d normal random variables. More generally, the distribution of {p(\mathbf x)} is the approximately the same as long as the random variables {x^{(i)}} are independent with mean {0}, variance {1} and satisfy certain mild regularity assumptions.

A phenomenon of this nature where the distribution of a function of random variables, depends solely on a small number of their moments is referred to as invariance.

A natural approach to generalize of the above stated central limit theorem, is to replace the “sum” ({\frac{1}{\sqrt{n}} (x^{(1)} + x^{(2)} + 	\ldots x^{(n)})}) by other multivariate polynomials.   As we will see in the next lecture, the invariance principle indeed holds for low degree multilinear polynomials that are not influenced heavily by any single coordinate. Formally, define the influence of a coordinate on a polynomial as follows:

Definition 1 For a multilinear polynomial {F(\mathbf{x}) = \sum_{\sigma} 	\hat{F}_{\sigma} \prod_{i \in [R]} x^{(\sigma_i)}} define the influence of the {i^{th}} coordinate as follows:

{	\mathrm{Inf}_{i}(F) = \mathop{\mathbb E}_{x} [\mathrm{Var}_{x^{(i)}}[F]] = \sum_{\sigma_{i} \neq 	0} \hat{F}^2_\sigma}

Here {\mathrm{Var}_{x^{(i)}}[F]} denotes the variance of {F(x)} over random choice of {x_i}.

The invariance principle for low degree polynomials was first shown by Rotar in 1979. More recently, invariance principles for low degree polynomials were shown in different settings in the work of Mossel-O’Donnell-Olekszciewicz and Chatterjee. The former of the two works also showed the Majority is Stablest conjecture, and has been influential in introducing the powerful tool of invariance to hardness of approximation.

Here we state a special case of the invariance principle tailored to the application at hand.  To this end, let us first define a rounding function {f_{[-1,1]}} as follows:

\displaystyle f_{[-1,1]}(x) = \begin{cases} -1 &\text{ if } x < -1 \\ x &\text{ if } -1\leqslant x\leqslant 1 \\ 1 &\text{ if } x > 1 \end{cases}

Theorem 2 (Invariance Principle [Mossel-ODonnell-Oleszkiewicz]) Let {\mathbf{z} = \{z_1, z_2\}} and {\mathbf{G} = \{g_1,g_2\}} be sets of Bernoulli and Gaussian random variables respectively. Furthermore, let

\displaystyle  \mathop{\mathbb E}[z_i] = \mathop{\mathbb E}[g_i] = 0 \qquad \qquad \mathop{\mathbb E}[z_i^2] = \mathop{\mathbb E}[g_i^2] = 1 \qquad \qquad \forall i \in [2]

and {\mathop{\mathbb E}[z_1 z_2] = \mathop{\mathbb E}[g_1 g_2]}. Let {\mathbf{z}^R, \mathbf{G}^R} denote {R} independent copies of the random variables {\mathbf{z}} and {\mathbf{G}}.

Let {F} be a multilinear polynomial given by {F(\mathbf{x}) = \sum_{\sigma} \hat{F}_{\sigma} \prod_{i \in \sigma} x^{(i)}}, and let {H(\mathbf{x})=T_{1-\epsilon} F(\mathbf{x}) = \sum_{\sigma}(1-\epsilon)^{|\sigma|} \hat{F}_{\sigma} \prod_{i\in \sigma} x^{(i)}}. If {\mathrm{Inf}_\ell(H) \leqslant \tau} for all {\ell \in [R]}, then the following statements hold:

  1. For every function {\Psi : \mathbb{R}^2 	 \rightarrow \mathbb{R}} which is thrice differentiable with all its partial derivatives up to order {3} bounded uniformly by {C_0},

    \displaystyle  \Big|\mathop{\mathbb E}\Big[\Psi(H(\mathbf{z}_1^{R}), 	 H(\mathbf{z}_2^{R}))\Big] - \mathop{\mathbb E}\Big[\Psi(H(\mathbf{g}_1^R), H(\mathbf{g}_2^R))\Big] \Big| \leqslant \tau^{O(\epsilon)}

  2. Define the function {\xi : {\mathbb R}^2 \rightarrow 	 {\mathbb R}} as {\xi(\mathbf{x}) = \sum_{i\in [2]} (x_i - 	 f_{[-1,1]}(x_i))^2} Then, we have { \Big|\mathop{\mathbb E}[\xi(H(\mathbf{z}_1^{n}), H(\mathbf{z}_2^{n}))] - \mathop{\mathbb E}[\xi(H(\mathbf{g}_1^{n}),H(\mathbf{g}_2^{n}))] \Big| \leqslant \tau^{O(\epsilon)} }

2. Dictatorship Testing

A boolean function {\mathcal{F}(x_1,x_2,\ldots,x_n)} on {n} bits is a dictator or a long code if {\mathcal{F}(x) = x_i} for some {i}. Given the truth table of a function {\mathcal{F}}, a dictatorship test is a randomized procedure that queries a few locations (say {2}) in the truth table of {\mathcal{F}}, and tests a predicate {P} on the values it queried. If the queried values satisfy the predicate {P}, the test outputs ACCEPT else it outputs REJECT. The goal of the test is to determine if {\mathcal{F}} is a dictator or is far from being a dictator. The main parameters of interest in a dictatorship test are :

  • Completeness{(c)} Every dictator function {\mathcal{F}(x) = x_i} is accepted with probability at least {c}.
  • Soundness{(s)} Any function {\mathcal{F}} which is far from every dictator is accepted with probability at most {s}.
  • Number of queries made, and the predicate {P} used by the test.

2.1. Motivation

The motivation for the problem of dictatorship testing arises from hardness of approximation and PCP constructions. To show that an optimization problem {\Lambda} is {\mathbf{NP}}-hard to approximate, one constructs a reduction from a well-known {\mathbf{NP}}-hard problem such as Label Cover to {\Lambda}. Given an instance {\Im} of the Label Cover problem, a hardness reduction produces an instance {\Im'} of the problem {\Lambda}. The instance {\Im'} has a large optimum value if and only if {\Im} has a high optimum. Dictatorship tests serve as “gadgets” that encode solutions to the Label Cover, as solutions to the problem {\Lambda}. In fact, constructing an appropriate dictatorship test almost always translates in to a corresponding hardness result based on the Unique Games Conjecture.

Dictatorship tests or long code tests as they are also referred to, were originally conceived purely from the insight of error correcting codes. Let us suppose we are to encode a message that could take one of {R} different values {\{m_1,\ldots,m_{R}\}}. The long code encoding of the message {m_\ell} is a bit string of length {2^{R}}, consisting of the truth table of the function {\mathcal{F}(x_1,\ldots,x_R) = x_\ell}. This encoding is maximally redundant in that any binary encoding with more than {2^{R}} bits would contain {2} bits that are identical for all R messages. Intuitively, greater the redundancy in the encoding, the easier it is to perform the reduction.

While long code tests/dictatorship tests were originally conceived from a coding theoretic perspective, somewhat surprisingly these objects are intimately connected to semidefinite programs. This connection between semidefinite programs and dictatorship tests is the subject of today’s lecture. In particular, we will see a black-box reduction from SDP integrality gaps for Max Cut to dictatorship tests that can be used to show hardness of approximating Max Cut.

2.2. The case of Max Cut

The nature of dictatorship test needed for a hardness reduction varies with the specific problem one is trying to show is hard. To keep things concrete and simple, we will restrict our attention to the Max Cut problem.

A dictatorship test {\mathsf{DICT}} for the Max Cut problem consists of a graph on the set of vertices {\{\pm 1\}^{R}}. By convention, the graph {\mathsf{DICT}} is a weighted graph where the edge weights form a probability distribution (sum up to {1}). We will write {(\mathbf{z},\mathbf{z}') \in \mathsf{DICT}} to denote an edge sampled from the graph {\mathsf{DICT}} (here {\mathbf{z},\mathbf{z}' 	\in \{\pm 1\}^{R}}).

A cut of the {\mathsf{DICT}} graph can be thought of as a boolean function {\mathcal{F} : \{\pm 1\}^R \rightarrow \{\pm 1\}}. For a boolean function {\mathcal{F} : \{\pm 1\}^R \rightarrow \{\pm 1\}}, let {\mathsf{DICT}(\mathcal{F})} denote the value of the cut. The value of a cut {\mathcal{F}} is given by

\displaystyle  \mathsf{DICT}(\mathcal{F}) = \frac{1}{2}\mathop{\mathbb E}_{ (\mathbf{z}, \mathbf{z}')} \Big[ 		1 - \mathcal{F}(\mathbf{z}) \mathcal{F}(\mathbf{z}') \Big]

It is useful to define {\mathsf{DICT}(\mathcal{F})}, for non-boolean functions {\mathcal{F}: \{\pm 1\}^R \rightarrow [-1,1]} that take values in the interval {[-1,1]}. To this end, we will interpret a value {\mathcal{F}(\mathbf{z}) \in [-1,1]} as a random variable that takes {\{\pm 1\}} values. Specifically, we think of a number {a 	\in [-1,1]} as the following random variable

$ latex a = -1 $ with probability \frac{1-a}{2} and a = 1 with probability  $\frac{1+a}{2}$.

With this interpretation, the natural definition of {\mathsf{DICT}(\mathcal{F})} is as follows:

\displaystyle  \mathsf{DICT}(\mathcal{F}) = \frac{1}{2}\mathop{\mathbb E}_{ (\mathbf{z}, \mathbf{z}') \in 	\mathsf{DICT}} \Big[ 		1 - \mathcal{F}(\mathbf{z}) \mathcal{F}(\mathbf{z}') \Big] {\,.}

Indeed, the above expression is equal to the expected value of the cut obtained by randomly rounding the values of the function {\mathcal{F} : \{\pm 1\}^{R} \rightarrow [-1,1]} to {\{\pm 1\}} as described in Equation 2.

A Dictator Cut
A Dictator Cut

The dictator cuts are given by the functions {\mathcal{F}(\mathbf{z}) = z_\ell} for some {\ell \in [R]}. The {\mathsf{Completeness}} of the test {\mathsf{DICT}} is the minimum value of a dictator cut, i.e.,

\displaystyle  \mathsf{Completeness}(\mathsf{DICT}) = \min_{\ell \in [R]} 	\mathsf{DICT}(z_{\ell})

The soundness of the dictatorship test is the value of cuts that are far from every dictator. We will formalize the notion of being far from every dictator is formalized using influences as follows:

Cut Far From Every Dictator
Cut Far From Every Dictator

Continue reading