Lecture 7: The evasiveness conjecture

Continuing our look at some toplogical methods, today we’ll see the evasiveness conjecture in decision tree complexity.  In the next lecture, we’ll see how we can sometimes analyze the complexity using fixed point theorems, and their generalizations (like the Hopf index formula), following the work of Kahn, Saks, and Sturtevant.  These two lectures are co-blogged with Elisa Celis, with a lot of input from Lovasz’s lecture notes.

Decision tree complexity and evasiveness

Consider a boolean function f : \{0,1\}^n \to \{0,1\} on n bits.  We define the decision tree complexity of f as follows.  Given an unknown input x \in \{0,1\}^n, you are allowed to ask about the values of various bits of x, e.g. x_{17}, x_{34}, x_3, \ldots.  Your goal is to compute f(x) using as few questions as possible, and your questions can be adaptive, depending on answers to previous questions.  The complexity of such a strategy is the maximum number of questions asked for any x \in \{0,1\}^n.  The decision tree complexity, written D(f), is the minimum complexity of any strategy that computes f.  (There are many other interesting models of decision complexity, see e.g. this survey.)

Clearly D(f) \leq n, because we can trivially query all the bits of x, and then output f(x).  A function f is called evasive if this upper bound is met, i.e. D(f)=n.  As an example, consider the parity function \mathsf{PARITY}(x) = x_1 \oplus x_2 \oplus \cdots \oplus x_n, where \oplus is addition modulo 2.  Clearly \mathsf{PARITY} is evasive because after n-1 bits of x are asked about, the setting of the final bit determines the value of f.

For a more general example, consider any f : \{0,1\}^n \to \{0,1\} such that \#\{ x : f(x)=1 \} is odd.  In this case, for every i =1,2,\ldots, n, exactly one of f|_{x_i=0} or f|_{x_i=1} has the same property that the number of inputs resulting in a 1 is odd.  (These two functions are the natural restriction of f to functions on n-1 bits, which results from fixing the value of the ith bit.)  Thus an adversary could keep answering questions “x_i?” so that the restricted function retains this property.  Since the number of inputs yielding a 1 is always odd, the restricted function always takes both possible values, implying that f is evasive–the advesary ensures that the value cannot be determined until all n possible questions are asked.

For an example of a non-evasive property, think of a point x \in \{0,1\}^{N \choose 2} as speciying a directed graph on N vertices, where there is exactly one directed edges connecting every pair of vertices, and x specifies the direction of this edge (this is called a tournament).  Thinking of the vertices as players, a directed edge from u to v means that u defeats v.  Now f(x)=1 if the digraph specified by x has one vertex that defeats everyone else.  What is D(f)?

Well, first we can conduct a single elimination tournament, where vertex 1 plays vertex 2, and the winner players vertex 3, and the winner of that players vertex 4, etc.  At the end, there is only one remaining vertex i that remains undefeated.  Now asking N-2 more questions, we can determine whether i indeeds defeats everyone else.  The total number of questions was (N-1)+(N-2) = 2N-3, hence D(f) \leq 2N-3 \ll {N \choose 2}, implying that f is not evasive.

Montone graph properties and the evasiveness conjecture

Let m = {N \choose 2}.  In general, we can encode an arbitrary undirected N-vertex graph as an element G \in \{0,1\}^{m}.  A function f: \{0,1\}^m \to \{0,1\} is called a graph property if relabeling the vertices of G doesn’t affect the value of f(G).  The function f is monotone if the value of the function can never change from 1 to 0 when flipping one of the input bits from 0 to 1.  In the setting of graph properties, this corresponds to those which are maintained under addition of edges to the graph, e.g. f(G) = “is G connected?” or f(G)= “does G have a k-clique?”

Evasiveness Conjecture (Aanderaa-Karp-Rosenberg): Every non-trivial monotone graph property is evasive.

Here, non-trivial means that f(\bar 0) \neq f(\bar 1), where \bar 0 and \bar 1 denote the all-zeros and all-ones strings, respectively.

For example, consider the example f(G) = “is G connected?”  The adversary is simple:  When asked about a possible edge \{i,j\}, she answers NO unless this answer would imply that the graph is disconnected.  In other words, she answers NO unless she has answered NO already for all edges across a cut (S,\bar S) except for \{i,j\}, in which case she has to answer YES.

Now, suppose there is a strategy which figures out the connectivity of G without asking a question about some edge \{i,j\}.  In this case, the conclusion must be that f(G) = \textrm{ YES} because the adversary always maintains that by answering everything in the future YES, she could force the graph to be connected.  In this case, the edges answered YES have to form a spanning tree of G (otherwise by answering all unasked questions NO, the graph would become disconnected).  Consider a path P from i to j in this YES spanning tree.  Let \{u,v\} be the edge of P which was asked about last.  Clearly the adversary answered YES for \{u,v\}, but this contradicts the advesary’s strategy.  Since \{i,j\} has not been asked yet, the adversary is safe to answer NO for \{u,v\}, and still later by answering YES on \{i,j\}, she could force the graph to be connected.  Thus no such strategy exists, and connectivity is evasive.

Continue reading

Lecture 6: Borsuk-Ulam and some combinatorial applications

Now we’ll move away from spectral methods, and into a few lectures on topological methods.  Today we’ll look at the Borsuk-Ulam theorem, and see a stunning application to combinatorics, given by Lovász in the late 70’s.  A great reference for this material is Matousek’s book, from which I borrow heavily.  I’ll also discuss why the Lovász-Kneser theorem arises in theoretical CS.

The Borsuk-Ulam Theorem

We begin with a statement of the theorem.  We will think of the n-dimensional sphere as the subset of \mathbb R^{n+1} given by

S^n = \left\{ (x_1, \ldots, x_{n+1}) : x_1^2 + \cdots + x_{n+1}^2 = 1\right\}.

Borsuk-Ulam Theorem: For every continuous mapping f : S^n \to \mathbb R^n, there exists a point x \in S^n with f(x)=f(-x).

Pairs of points x,-x \in S^n are called antipodal.

There are a couple of common illustrative examples for the case n=2.  The theorem says that if you take the air out of a basketball, crumple it (no tearing), and flatten it out, then there are two points directly on top of each other which were antipodal before.  Another common example states that at every point in time, there must be two points on the earth which both have exactly the same temperature and barometric pressure (assuming, of course, that these two parameters vary continuously over the surface of the eath).

The n=1 case is completely elementary.  For the rest of the lecture, let’s use N = (0,0,\ldots,1) and S=(0,0,\ldots,-1) to denote the north and south poles (the dimension will be obvious from context).  To prove the n=1 case, simply trace out the path in \mathbb R starting at f(N) and going clockwise around S^1.  Simultaneously, trace out the path starting at f(S) and going counter-clockwise at the same speed.  It is easy to see that eventually these two paths have to collide:  At the point of collision, f(x)=f(-x).

We will give the sketch of a proof for n \geq 2.  Let g(x)=f(x)-f(-x), and note that our goal is to prove that g(x)=0 for some x \in S^n.  Note that g is antipodal in the sense that g(x)=-g(-x) for all x \in S^n.  Now, if g(x) \neq 0 for every x, then by compactness there exists an \epsilon > 0 such that \|g(x)\| > \epsilon for all x \in S^n.  Because of this, we may approximate g arbitrarily well by a smooth map, and prove that the approximation has a 0.  So we will assume that g itself is smooth.

Now, define h : S^n \to \mathbb R^n by h(x_1, \ldots, x_{n+1}) = (x_1, \ldots, x_n), i.e. the north/south projection map.  Let X = S^n \times [0,1] be a hollow cylinder, and let F : X \to \mathbb R^n be given by F(x,t)=t g(x) + (1-t)h(x) so that F linearly interpolates between g and h.

Also, let’s define an antipodality on X by \nu(x,t) = (-x,t).  Note that F is antipodal with respect to \nu, i.e. F(\nu(x,t))=-F(x,t), because both g and h are antipodal.  For the sake of contradiction, assume that g(x) \neq 0 for all x \in S^n.

Now let’s consider the structure of the zero set Z = F^{-1}(0).  Certainly (N,0), (S,0) \in Z since h(N)=h(S)=0, and these are h’s only zeros.  Here comes the sketchy part:  Since g is smooth, F is also smooth, and thus locally F can be approximated by an affine mapping F_{\mathrm{loc}} : \mathbb R^{n+1} \to \mathbb R^n.  It follows that if F_{\mathrm{loc}}^{-1}(0) is not empty, then it should be a subspace of dimension at least one.  By an arbitrarily small perturbation of the initial g, ensuring that F is sufficiently generic, we can ensure that F_{\mathrm{loc}}^{-1}(0) is either empty or a subspace of dimension one.  Thus locally, F^{-1}(0) should look like a two-sided curve, except at the boundaries t=0 and t=1, where F^{-1}(0) (if non-empty) would look like a one-sided curve.  But, for instance, F^{-1}(0) cannot contain any Y-shaped branches.

It follows that Z is a union of closed cycles and paths whose endpoints must lie at the boundaries t=0 and t=1.  (Z is represented by red lines in the cylinder above.)  But since there are only two zeros on the t=0 sphere, and none on the t=1 sphere, Z must contain a path \gamma from (N,0) to (S,0).  Since F is antipodal with respect to \nu, \gamma must also satisfy this symmetry, making it impossible for the segment initiating at N to ever meet up with the segment initiating at S.  Thus we arrive at a contradiction, implying that g must take the value 0.

Notice that the only important property we used about h (other than its smoothness) is that is has a number of zeros which is twice an odd number.  If h had, e.g. four zeros, then we could have two \nu-symmetric paths emanating from and returning to the bottom.  But if h has six zeros, then we would again reach a contradiction.

Continue reading

Lecture 5: Uniformizing graphs, multi-flows, and eigenvalues

In the previous lecture, we gave an upper bound on the second eigenvalue of the Laplacian of (bounded degree) planar graphs in order to analyze a simple spectral partitioning algorithm.  A natural question is whether these bounds extend to more general families of graphs.  Well-known generalizations of planar graphs are those which can be embedded on a surface of fixed genus, and, more generally, families of graphs that arise by forbidding minors.  In fact, Spielman and Teng conjectured that for any graph excluding K_h as a minor, one should have \lambda_2 \lesssim \mathrm{poly}(h) d_{\max}/n.   Of course planar graphs have genus 0, and by Wagner’s theorem, are precisely the graphs which exclude K_5 and K_{3,3} as minors.  In this lecture, we will follow an intrinsic approach of Biswal, myself, and Rao which, in particular, is able to resolve the conjecture of Spielman and Teng.  First, we see why even pushing the conformal approach to bounded genus graphs is difficult.

Bounded genus graphs

For graphs of bounded genus, there is hope to use an approach based on conformal mappings.  In 1980, Yang and Yau proved that

\displaystyle \lambda_2(M) \lesssim \frac{g+1}{\mathrm{vol}(M)}

for any compact Riemannian surface M of genus g.  (Note that for the Laplace-Beltrami operator, one usually writes \lambda_1 as the first non-zero eigenvalue, rather than \lambda_2.)  In analog with Hersch’s proof of the genus 0 case, they use Riemann-Roch to obtain a degree-(g+1) conformal mapping to the Riemann sphere, then try to pull back a second eigenfunction.  A factor of the degree is lost in the Rayleigh quotient (hence the g+1 factor in the preceding bound), and Hersch’s Möbius trick is still required.

An analogous proof for graphs G of bounded genus would proceed by constructing a circle packing of G on the sphere S^2, but instead of the circles having disjoint interiors, we would be assured that every point of S^2 is contained in at most g circles.  Unfortunately, such a result is impossible (this has to do with the handling of branch points in the discrete setting).  Kelner has to take a different approach in his proof that \lambda_2(G) \leq d_{\max}^{O(1)} (g+1)/n for graphs G of genus at most g.

He starts with a circle packing of G on a compact surface \mathbb S_0 of genus g (whose existence follows from results of Beardon and Stephenon and He and Schramm).  Then Kelner randomly subdivides G repeatedly, and these subdivisions give progressively better approximations to some sequence of surfaces \{\mathbb S_i\}.  Once the approximation is of high enough quality, one applies Riemann-Roch to \mathbb S_k, and infers something about a subdivision of G.  The final element is to track how the second eigenvalue of G changes (in expectation) under random subdivision.

Needless to say, this approach is already quite delicate, and for graphs that can’t be equipped with some kind of conformal structure, we seem to have reached a dead end.  In this lecture, we’ll see how to use intrinsic deformations of the geometry of G in order to bound its eigenvalues.  Eventually, this will reduce to the study of certain kinds of multi-commodity flows.

Metrics on graphs

Let G=(V,E) be an arbitrary n-vertex graph with maximum degree d_{\max}.  Recall that we can write

\displaystyle \lambda_2 = \min_{f \neq 0 : \sum_{x \in V} f(x)=0} \frac{\sum_{xy \in E} |f(x)-f(y)|^2}{\sum_{x \in V} f(x)^2}.

where f : V \to \mathbb R.  (Also recall that we can replace \mathbb R by any Hilbert space, and the same formula holds.)  The first step is to prepare this equality for “non-linearization” by getting rid of the linear condition \sum_{x \in V} f(x)=0 and the sum \sum_{x \in V} f(x)^2.  (This is a popular sort of passage in the non-linear geometry of Banach spaces, which also plays a rather important role in applications to the theoretical CS.)  The goal is to get only terms that look like |f(x)-f(y)|.  Fortunately, there is a well-known way to do this:

\displaystyle \lambda_2 = 2 n \cdot \min_{f : V \to \mathbb R} \frac{\sum_{xy \in E} |f(x)-f(y)|^2}{\sum_{x,y \in V} |f(x)-f(y)|^2},

which follows easily from the equality \sum_{x,y \in V} |f(x)-f(y)|^2 = 2n \sum_{x \in V} f(x)^2 when \sum_{x \in V} f(x)=0.

Thus if we want to bound \lambda_2 = O(1/n), we need to find an f : V \to \mathbb R for which the latter ratio (without the 2n) is O(1/n^2).  Now, for someone who works a lot with linear programming relaxations, it’s very natural to consider a “relaxation”

\displaystyle \gamma(G) = \min_{d} \frac{\sum_{xy \in E} d(x,y)^2}{\sum_{x,y \in V} d(x,y)^2},

where the minimization is over all pseudo-metrics d, i.e. symmetric non-negative functions d : V \times V \to \mathbb R which satisfy the triangle inequality, but might have d(x,y)=0 even for x \neq y.  Certainly \gamma(G) \leq \lambda_2/2n, but Bourgain’s embedding theorem (which states that every n-point metric space embeds into a Hilbert space with distortion at most O(\log n)) also assures us that \lambda_2(G) \leq O(n \log^2 n) \gamma(G).  Since we are trying to show that \gamma(G) = O(1/n^2), this O(\log^2 n) term is morally negligible.  One can see the paper for a more advanced embedding argument that doesn’t lose this factor, but for now we concentrate on proving that \gamma(G) = O(1/n^2).  The embedding theorems allow us to concentrate on finding an intrinsic metric on the graph with small “Rayleigh quotient,” without having to worry about an eventual geometric representation.

As a brief preview… we are going to find a good metric d by taking a certain kind of all-pairs multi-commodity flow at optimality, and weighting the edges by their congestion in the optimal flow.  Thus as the flow spreads out on the graph, it has the effect of “uniformizing” its geometry.

Discrete Riemannian metrics, convexification, and duality

Let’s now assume that G is planar.  We want to show that \gamma(G) = O(d_{\max}/n^2).  First, let’s restrict ourselves to vertex weighted metrics on G.  Given any non-negative weight function \omega : V \to \mathbb R, we can define the length of a path P in G by summing the weights of vertices along it:  \mathsf{len}_{\omega}(P) = \sum_{x \in P} \omega(x).  Then we can define a vertex-weighted shortest-path pseudo-metric on G in the natural way

\displaystyle \mathsf{dist}_{\omega}(x,y) = \min \left\{ \mathsf{len}_{\omega}(P) : P \in \mathcal P_{xy}\right\},

where \mathcal P_{uv} is the set of all u-v paths in G.  We also have the nice relationship

\displaystyle \sum_{xy \in E} \mathsf{dist}_{\omega}(x,y)^2 \leq 2 d_{\max} \sum_{x \in V} \omega(x)^2,\qquad(1)

since \mathsf{dist}_{\omega}(x,y) \leq [\omega(x)+\omega(y)]^2.

So if we define

\displaystyle \Lambda_0(\omega) = \frac{\displaystyle \sum_{x \in V} \omega(x)^2}{\displaystyle \sum_{x,y \in V} \mathsf{dist}_{\omega}(x,y)^2}

then by (1), we have \gamma(G) \leq 2 d_{\max} \min_{\omega} \Lambda_0(\omega).

Examples. Let’s try to exhibit weights \omega for two well-known examples:  the grid, and the complete binary tree.

For the \sqrt{n} \times \sqrt{n} grid, we can simply take \omega(x)=1 for all x \in V.  Clearly \sum_{x \in V} \omega(x)^2 = n.  On the other hand, a random pair of points in the grid is \Omega(\sqrt{n}) apart, hence \sum_{x,y \in V} \mathsf{dist}_{\omega}(x,y)^2 \approx n^2 \cdot (\sqrt{n})^2 = n^3.  It follows that \Lambda_0(\omega) = O(1/n^2), as desired.

For the complete binary tree with root r, we can simply put \omega(r)=1 and \omega(x)=0 for x \neq r.  (Astute readers will guess the geometrically decreasing weights are actually the optimal choice.)  In this case, \sum_{x \in V} \omega(x)^2 = 1, while all the pairs x,y on opposite sides of the root have \mathsf{dist}_{\omega}(x,y)=1.  It again follows that \Lambda_0(\omega) = O(1/n^2).  Our goal is to provide such a weight \omega for any planar graph.

Continue reading

Lecture 4: Conformal mappings, circle packings, and spectral geometry

In Lecture 2, we used spectral partitioning to rule out the existence of a strong parallel repetition theorem for unique games.  In practice, spectral methods are a very successful heuristic for graph partitioning, and in the present lecture we’ll see how to analyze these partitioning algorithms for some common families of graphs.

Balanced separators, eigenvalues, and Cheeger’s inequality

Lipton and Tarjan proved that every planar graph has a negligibly small set of nodes whose remval splits the graph into two roughly equal pieces.  More specifically, every n-node planar graph can be partitioned into three disjoint sets A,B,S such that there are no edges from A to B, the separator S has at most O(\sqrt{n}) nodes, and |A|,|B| \geq n/3.  This allows one to do all sorts of things, e.g. a simple divide-and-conquer algorithm gives a linear time (1+\epsilon)-approximation for the maximum independent set problem in such graphs, for any \epsilon > 0.

So there is a natural question of how well spectral methods do, for example, on planar graphs.  Spielman and Teng showed that for bounded-degree planar graphs, a simple recursive spectral algorithm recovers a partition V=A \cup B of the vertex set so that |E(A,B)| = O(\sqrt{n}).  In other words, for bounded-degree planar graphs, spectral methods recover the Lipton-Tarjan separator theorem!  This is proved by combining Cheeger’s inequality with their main theorem.

Theorem [Spielman-Teng]: Every n-node planar graph with maximum degree d_{\max} has \displaystyle \lambda_2(G) = O\left(\frac{d_{\max}}{n}\right), where \lambda_2(G) is the second eigenvalue of the combinatorial Laplacian on G.

Recall that we introduced the combinatorial Laplacian in Lecture 2.  If G=(V,E) is an arbitrary finite graph, in this lecture it will make more sense to think about the Laplacian \Delta as an operator on functions f : V \to \mathbb R given by

\displaystyle \Delta f(x) = \mathrm{deg}(x) f(x) - \sum_{y : xy \in E} f(y).

If we define the standard inner product \langle f,g\rangle = \sum_{x \in V} f(x)g(x), then one can easily check that for any such f, we have \langle f, \Delta f\rangle = \sum_{xy \in E} |f(x)-f(y)|^2.  In particular, this implies that \Delta is a positive semi-definite operator.  If we denote its eigenvalues by \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n, then it is also easy to check that \lambda_1 = 0, with corresponding eigenfunction f(x)=1 for every x\in V.

Thus by standard variational principles, we have

\displaystyle \lambda_2 = \min_{f \neq 0 : \sum_{x \in V} f(x)=0} \frac{\sum_{xy \in E} |f(x)-f(y)|^2}{\sum_{x \in V} f(x)^2}.

Let us also define the Cheeger constant h_G.  For an arbitrary subset S \subseteq V, let

\displaystyle h(S) = \frac{|E(S, \bar S)|}{\min(|S|,|\bar S|)},

note that this definition varies from the h we defined in Lecture 2, because we will be discussing eigenfunctions without boundary conditions.  Now one defines h_G = \min_{S \subseteq V} h(S).

Finally, we have the version of Cheeger’s inequality (proved by Alon and Milman in the discrete setting) for graphs without boundary.

Cheeger’s inequality: If G=(V,E) is any graph with maximum degree d_{\max}, then

\displaystyle \lambda_2(G) \geq \frac{h(G)^2}{2d_{\max}}.

This follows fairly easy from the Dirichlet version of Cheeger’s inequality presented in Lecture 2.  Here’s a sketch:  Let f : V \to \mathbb R satisfy \Delta f = \lambda_2 f, and suppose, without loss of generality, that V_+ = \{ x : f(x) > 0 \} has |V_+| \geq n/2.  Define f_+(x)=f(x) for f(x) > 0 and f_+(x)=0 otherwise.  Then f_+|_B = 0 for B = V \setminus V_+, so we can plug f_+ into the Dirichlet version of Cheeger’s inequality with boundary conditions on B.  For the full analysis, see this note which essentially follows this approach.  By examining the proof, note that one can find a subset S \subseteq V with h(S) \leq \sqrt{2 d_{\max} \lambda_2} by a simple “sweep” algorithm:  Arrange the vertices V = \{v_1, v_2, \ldots, v_n\} so that f(v_1) \leq f(v_2) \leq \cdots \leq f(v_n), and output the best of the n-1 cuts \{v_1, \ldots, v_i\}, \{v_{i+1}, \ldots, v_n\}.

So using the eigenvalue theorem of Spielman and Teng, along with Cheeger’s inequality, we can find a set S \subseteq V with h(S) \lesssim \sqrt{d_{\max}/n}.  While this cut has the right Cheeger constant, it is not necessarily balanced (i.e. \min(S, \bar S) could be very small).  But one can apply this algorithm recursively, perhaps continually cutting small chunks off of the graph until a balanced cut is collected.  Refer to the Spielman-Teng paper for details.  A great open question is how one might use spectral information about G to recover a balanced cut immediately, without the need for recursion.

Conformal mappings and circle packings

Now we focus on proving the bound \lambda_2(G) \lesssim d_{\max}/n for any planar graph G.  A natural analog is to look at what happens for the Laplace-Beltrami operator for a Riemannian metric on the 2-sphere.  In fact, Hersch considered this problem almost 40 years ago and proved that \lambda_2(M) \lesssim 1/\mathrm{vol}(M), for any such Riemannian manifold M.  His approach was to first use the uniformization theorem to get a conformal mapping from M onto S^2, and then try to pull-back the standard second eigenfunctions on S^2 \subseteq \mathbb R^3 (which are just the three coordinate projections).  Since the Dirichlet energy is conformally invariant in dimension 2, this almost works, except that the pulled-back map might not be orthogonal to the constant function.  To fix this, he has to post-process the initial conformal mapping with an appropriate Möbius transformation.

Unaware of Hersch’s work, Spielman and Teng derived eigenvalue bounds for planar graphs using the discrete analog of this approach:  Circle packings replace conformal mappings, and one still has to show the existence of an appropriate post-processing Möbius transformation.

Continue reading