# A simpler proof of the KPR theorem

## 1. Introduction

The Klein-Plotkin-Rao (KPR) Theorem is a powerful statement about the geometry of planar graphs and their generalizations. Here, I’ll present a new, very simple proof of the theorem that was discovered in joint work with Cyrus Rashtchian. (This will appear in a preprint soon, together with some new results.) In the next post, I’ll give some applications in geometry and algorithms.

Recall that a graph is planar if it can be drawn in the plane without any edge crossings. Wagner’s theorem gives an intrinsic characterization of planar graphs in terms of excluded minors. Recall that a graph ${H}$ is a minor of a graph ${G}$ if ${H}$ can be obtained from ${G}$ by a sequence of (i) edge and vertex deletions and (ii) contraction of edges. A graph ${G}$ excludes ${H}$ as a minor if ${H}$ is not a minor of ${G}$. Kuratowki’s theorem states that planar graphs are precisely those which exclude both ${K_5}$ and ${K_{3,3}}$ as minors, where we use ${K_h}$ and ${K_{h,h}}$ to denote the complete graph on ${h}$ vertices and the complete ${h}$-by-${h}$ bipartite graph, respectively. In this post, we are particularly concerned with ${K_h}$-minor-free graphs, i.e. those which exclude ${K_h}$ as a minor for some ${h \geq 2}$.

I’ll first state and prove a simpler version of the KPR theorem. In the next post, I’ll discuss a stronger statement (in the language of random partitions) that follows directly from the proof. Then using these partitions, we will show that the observable diameter of every ${K_h}$-minor-free graph is “large,” and use that fact to prove an upper bound on the uniform multi-commodity flow gap in such graphs.

## 2. Low-diameter graph partitioning

Consider a finite graph ${G=(V,E)}$ equipped with its shortest-path metric ${d_G}$ (much of what we say here extends to infinite graphs). For now, all the edges of ${G}$ will have length one, although we will generalize to arbitrary weighted graphs for some applications in the next post.

Given a subset ${S \subseteq V}$, we write ${\mathsf{diam}(S) = \max_{x,y \in S} d_G(x,y)\,.}$ A weak but simpler version of the KPR Theorem can be stated as follows.

Theorem 1 Let ${G=(V,E)}$ be a graph that excludes ${K_h}$ as a minor. Then for every number ${\Delta \geq 1}$, there exists a partition ${V = S_1 \cup S_2 \cup \cdots \cup S_m}$ such that ${\mathsf{diam}(S_i) \leq \Delta}$ for every ${i=1,2,\ldots,m}$ and at most an ${O(h^2/\Delta)}$-fraction of edges of ${G}$ go between different sets in the partition.

The theorem was originally proved with a dependence of ${O(h^3)}$, but this was improved to ${O(h^2)}$ by Fakcharoenphol and Talwar. Today I will prove the ${O(h^2)}$ bound. The partitioning will be accomplished via an iterative operation which we will call chopping.

Consider any connected graph ${H}$ and a number ${\tau \geq 1}$. We will describe an operation which we call a ${\tau}$-chop of ${H}$. Let ${x_0 \in V(H)}$ be any node of ${H}$, which we will call the “root node” of the chop, and let ${r_0 \in [1,\tau]}$ (the “initial offset”).

The chopping operation is as follows: We partition ${V(H)=\bigcup_{j=0}^{\infty} A_j}$, where ${A_0 = \{ v \in V(H) : d_H(x_0, v) < r_0 \}}$, and the rest of the sets are the disjoint annuli,

$\displaystyle A_j =\left\{ v \in V(H) : r_0 + (j-1)\tau \leq d_H(x_0, v) < r_0 + j\tau \right\}\,,$

for ${j=1,2,\ldots}$. See Figure 1 for an example of these cuts (the red and blue alternating regions) on a grid graph. Of course, since ${H}$ is finite, eventually the annuli are empty.

We define a ${\tau}$-chop on a possibly disconnected graph as the partition arising from doing a ${\tau}$-chop on each of its connected components. Finally, we define a ${\tau}$-chop on a sequence of disjoint sets ${S_1, S_2, \ldots, S_k \subseteq V}$ as the result of doing a ${\tau}$-chop on each of the induced graphs ${G[S_i]}$ for ${i=1,2,\ldots,m}$. Thus if we have an initial partition ${P}$ of ${V}$, then a ${\tau}$-chop of ${P}$ produces a refined partition ${P'}$ of ${V}$. See Figure 2 for the result of two iterated $\tau$-chops applied to the grid graph. The yellow circles represent the root nodes in the second chop.

We can now state the main technical result needed to prove Theorem 1.

Lemma 2 If ${G=(V,E)}$ excludes ${K_h}$ as a minor, then for any ${\tau \geq 1}$, any sequence of ${h-1}$ iterated ${\tau}$-chops on ${V}$ results in a partition ${V = S_1 \cup S_2 \cup \cdots \cup S_m}$ such that ${\mathsf{diam}(S_i) \leq O(h \tau)}$ for each ${i=1,2,\ldots,m}$.

Observe that ${\mathsf{diam}(\cdot)}$ refers to the diameter in ${d_G}$, the shortest-path metric on ${G}$. Also, note that we do not constrain the root node or the initial offset of the chops. Klein, Plotkin, and Rao prove this lemma with a dependence of ${O(h^2 \tau)}$ on the diameter. FT use a more complicated approach.

To see how Lemma 2 implies Theorem 1, one proceeds as follows. First, let ${C}$ be large enough so that setting ${\tau = \Delta/(Ch)}$ in Lemma 2 yields a partition into sets of diameter at most ${\Delta}$. After fixing the root node for a ${\tau}$-chop of ${G}$, one can consider the initial offsets ${r_0=1,2,\ldots,\tau}$. An edge ${\{x,y\}}$ can be cut (i.e. ${x}$ and ${y}$ end up in distinct sets of the partition) in at most one of these offsets. Thus there exists an offset that cuts only a ${1/\tau = O(h/\Delta)}$-fraction of edges. Since we perform ${h-1}$ iterated ${\tau}$-chops, there exists a choice of initial offsets that cuts at most ${(h-1)/\tau = O(h^2/\Delta)}$-fraction of edges. That completes the reduction.

## 3. A sketch

Before moving onto the formal argument, I’ll present a simple sketch that contains the main ideas. The proof is by contradiction; if we perform a sequence of ${h}$ ${\tau}$-chops and the diameter of any remaining piece fails to be ${O(h \tau)}$, then we will construct a ${K_{h+1}}$ minor in ${G}$.

First, we give an equivalent characterization of when a graph ${G=(V,E)}$ has a graph ${H}$ as a minor: There exist disjoint connected subsets ${S_1, S_2, \ldots, S_{|V(H)|} \subseteq V}$, one corresponding to each vertex of ${H}$. We call these supernodes. Furthermore, there should be an edge between supernodes whenever there is an edge between the corresponding vertices in ${H}$.

Now, the proof is by induction. Note that the base case ${h=0}$ is trivial since a ${K_1}$ minor is a single vertex. By induction, we can assume that if a sequence of ${h}$ chops fails, then there must be a ${K_h}$ minor contained in some offending annulus. See Figure 3. If we could ensure that every supernode of the minor touched the upper boundary of the annulus as in Figure 3, we could easily construct a ${K_{h+1}}$ minor and be done, by simplying choosing the ${(h+1)}$-st supernode to be a ball around ${x_0}$.

Thus we need to enforce this extra property of our ${K_h}$ minor. The (very simple) idea is contained in Figure 4.

After finding a ${K_h}$ minor that intersects the annuli, we extend the supernodes to touch the upper boundary of the annulus from the preceding chop (which is represented by the purple line in the picture). The point is that we can choose these paths to be contained above the red boundary (and thus disjoint from the ${K_h}$ supernodes), and also each of length at most ${2\tau+1}$ since the width of the “purple” annulus is only ${\tau}$. The same can be done for ${x_0}$. (If we didn’t care that the paths have to be above the red boundary, we could choose them of length only ${\tau}$.)

The only issue is that we need these new paths to be disjoint. Since the paths are always short (length at most ${O(\tau)}$), we can enforce this by making sure that each supernode contains a representative and these representatives are pairwise far apart; then we grow the paths from the representatives. Initially, the representatives will be ${\Omega(h \tau)}$ apart, and then as we go up the inductive chain, they will get closer by at most ${O(\tau)}$ at every step. By choosing the initial separation large enough, they will remain disjoint. That’s the sketch; it should be possible to reproduce the proof from the sketch alone, but we now present a more formal proof.

## 4. The proof

We need a couple definitions. First, given a subset of vertices ${V_0 \subseteq V}$ and a number ${\tau \geq 0}$, we say that a set ${S}$ is ${\tau}$-dense in ${V_0}$ if every element of ${V_0}$ can reach an element of ${S}$ by a path of length ${\tau}$ that is contained completely in ${G[V_0]}$ (the induced graph on ${V_0}$). Second, we say that an ${H}$-minor is ${R}$-represented by ${S}$ if every supernode of ${H}$ contains a representative from ${S}$ and these representatives are pairwise distance more than ${R}$ apart in the metric ${d_G}$ (the global shortest-path metric on ${G}$). We now state a lemma that we can prove by induction and implies Lemma 2.

Lemma 3 Let ${h \geq j \geq 0}$ and ${\tau \geq 1}$ be any numbers. Suppose ${G=(V,E)}$ is a graph and there is any sequence of ${j}$ iterated ${\tau}$-chops on ${G}$ that leaves a component of diameter more than ${16 h\tau}$. Then for any set ${S}$ that is ${\tau}$-dense in ${V}$, one can find a ${K_{j+1}}$-minor that is ${6(h-j)\tau}$-represented by ${S}$.

One can recover Lemma 2 by setting ${h=j}$ and ${S=V}$.

Proof: We proceed by induction on ${j}$. The case ${j=0}$ is trivial since a ${K_1}$ minor is simply a single vertex. Thus we may assume that ${h \geq j \geq 1}$.

The following figure will be a useful reference.

In general, we argue as follows. Let ${S}$ be any set satisfying the assumptions of the lemma. Assume there is a sequence of ${j}$ iterated ${\tau}$-chops that leaves a component of diameter more than ${16 h \tau}$. Then there must be some annulus ${A_k}$ of the first ${\tau}$-chop such that ${j-1}$ iterated ${\tau}$-chops on ${A_k}$ leaves a component of diameter more than ${16 h \tau}$.

Suppose the first ${\tau}$-chop has root node ${x_0 \in V}$ and initial offset ${r_0}$. Let ${S' \subseteq A_k}$ be the set of nodes at distance exactly ${r_0 + (k-1) \tau}$, i.e. the upper boundary of ${A_k}$. Observe that ${S'}$ is ${\tau}$-dense in ${A_k}$ by construction. Thus by induction, there is a ${K_{j}}$-minor in ${G[A_k]}$ that is ${6(h-j+1)\tau}$-represented by ${S'}$. Let ${V_1, V_2, \ldots, V_j}$ be the supernodes of this minor, and let ${\{ v_i \in S' \cap V_i : i=1,2,\ldots,j \}}$ be the representatives which are further than ${6(h-j+1)\tau}$ apart.

We now extend this to a ${K_{j+1}}$-minor which is ${6(h-j)\tau}$-represented by ${S}$. First, we may assume that ${k \geq 6h+1}$. Otherwise, all the points of ${A_k}$ lie in a ball of radius at most ${r_0 + k \tau \leq (6h+2)\tau}$, and hence ${A_k}$ has diameter at most ${(12h+4)\tau \leq 16h \tau}$. In particular, we know that ${d_G(x_0, v_i) \geq 6h\tau}$ for every ${i=1,2,\ldots, j}$.

Now for each ${i=1,2,\ldots,j}$, we choose a point ${v_i' \in S \setminus A_i}$ such that ${v_i'}$ is connected to ${v_i}$ by a path of length at most ${3\tau}$ in ${G}$. This can be done by first going up a shortest-path from ${v_i}$ to ${x_0}$ of length ${\tau+1}$ to reach a point ${v_i''}$, and then choosing any point of ${S}$ within distance ${\tau}$ of ${v_i''}$ (which can always be done since ${S}$ is ${\tau}$-dense in ${V}$). We add this path to ${V_i}$ to get a new supernode ${V_i'}$. Observe that the sets ${\{V_i' : i=1,2,\ldots,j\}}$ are all connected and pairwise disjoint since the new paths are outside the annulus ${A_k}$ and the paths themselves are pairwise disjoint because they are each of length at most ${3\tau}$, but they emanate from representatives that are more than ${6(h-j+1)\tau \geq 6\tau}$ apart. In fact, this also shows that the representatives ${\{ v_i' \in S : i=1,2,\ldots,j \}}$ are further than ${6(h-j)\tau}$ apart, as required.

Finally, we construct a new supernode ${V_0}$ as follows. For each ${i=1,2,\ldots, j}$, let ${u_i \in V_i'}$ be the closest node to ${x_0}$, and let ${s_0 \in S}$ be the closest node in ${S}$ to ${x_0}$. For each ${i}$, let ${P_i}$ be a shortest-path from ${x_0}$ to ${u_i}$ without its endpoint ${u_i}$, and let ${P}$ be a shortest-path to ${s_0}$, including ${s_0}$. We now set ${V_0 = P \cup P_1 \cup P_2 \cup \cdots \cup P_j}$. We claim that ${\{ V_0, V_1', \ldots, V_j' \}}$ forms a ${K_{j+1}}$-minor which is ${6(h-j)\tau}$-represented by ${S}$. First, it is clear that ${d_G(s_0, v_i) > 6(h-j)\tau}$ since ${d_G(s_0, x_0) \leq \tau}$ (again, because ${S}$ is ${\tau}$-dense in ${V}$). For the same reason, the path ${P}$ is disjoint from all the sets ${\{V_i'\}}$.

Thus the only possible obstruction to having a valid ${K_{j+1}}$-minor is if some path ${P_i}$ intersects a set ${V'_{\ell}}$ for ${i \neq \ell}$. We now show that this cannot happen. We know that if ${P_i}$ intersects ${V'_{\ell}}$, then it must have already traveled distance at least ${(k-1)\tau - 3\tau}$ away from ${x_0}$. But ${P_i}$ contains a node adjacent to ${V_i}$ (by construction), which means it continues an additional distance of ${3\tau}$ (the distance between ${V'_{\ell} \setminus V_{\ell}}$ and ${V'_{i} \setminus V_i)}$. This additional distance is also moving away from ${x_0}$, implying that ${P_i}$ intersects ${A_k}$, which is impossible. This completes the proof.

## 5. The dependence on ${h}$

The best-known lower bound requires the conclusion of Theorem 1 to cut at least an ${\Omega((\log h)/\Delta)}$-fraction of edges. (One can take ${G=(V,E)}$ to be an ${n}$-vertex 3-regular expander graph, which obviously excludes ${K_n}$ as a minor. Now it is easy to see that for some constant ${c}$, partitioning into pieces of diameter at most ${c \log n}$ must cut at least an ${\Omega(1)}$-fraction of edges.) In some special cases, e.g. graphs of genus ${g}$ (which exclude ${K_{c \lceil \sqrt{g}\rceil}}$ as a minor for some constant ${c > 0}$), one can reduce the bound to ${O(\log g)}$ (see this joint work with Sidiropoulos). This leads to the following open problem.

Open problem: Show that under the assumptions of Theorem 1, one can find a partition that cuts only an ${O((\log h)/\Delta)}$-fraction of edges.

A positive resolution would yield an optimal unifom multi-commodity flow/cut gap for ${K_h}$-minor-free graphs. (See the next post for details.)

# Lecture 5: Uniformizing graphs, multi-flows, and eigenvalues

In the previous lecture, we gave an upper bound on the second eigenvalue of the Laplacian of (bounded degree) planar graphs in order to analyze a simple spectral partitioning algorithm.  A natural question is whether these bounds extend to more general families of graphs.  Well-known generalizations of planar graphs are those which can be embedded on a surface of fixed genus, and, more generally, families of graphs that arise by forbidding minors.  In fact, Spielman and Teng conjectured that for any graph excluding $K_h$ as a minor, one should have $\lambda_2 \lesssim \mathrm{poly}(h) d_{\max}/n$.   Of course planar graphs have genus 0, and by Wagner’s theorem, are precisely the graphs which exclude $K_5$ and $K_{3,3}$ as minors.  In this lecture, we will follow an intrinsic approach of Biswal, myself, and Rao which, in particular, is able to resolve the conjecture of Spielman and Teng.  First, we see why even pushing the conformal approach to bounded genus graphs is difficult.

### Bounded genus graphs

For graphs of bounded genus, there is hope to use an approach based on conformal mappings.  In 1980, Yang and Yau proved that

$\displaystyle \lambda_2(M) \lesssim \frac{g+1}{\mathrm{vol}(M)}$

for any compact Riemannian surface $M$ of genus $g$.  (Note that for the Laplace-Beltrami operator, one usually writes $\lambda_1$ as the first non-zero eigenvalue, rather than $\lambda_2$.)  In analog with Hersch’s proof of the genus 0 case, they use Riemann-Roch to obtain a degree-$(g+1)$ conformal mapping to the Riemann sphere, then try to pull back a second eigenfunction.  A factor of the degree is lost in the Rayleigh quotient (hence the $g+1$ factor in the preceding bound), and Hersch’s Möbius trick is still required.

An analogous proof for graphs $G$ of bounded genus would proceed by constructing a circle packing of $G$ on the sphere $S^2$, but instead of the circles having disjoint interiors, we would be assured that every point of $S^2$ is contained in at most $g$ circles.  Unfortunately, such a result is impossible (this has to do with the handling of branch points in the discrete setting).  Kelner has to take a different approach in his proof that $\lambda_2(G) \leq d_{\max}^{O(1)} (g+1)/n$ for graphs $G$ of genus at most $g$.

He starts with a circle packing of $G$ on a compact surface $\mathbb S_0$ of genus $g$ (whose existence follows from results of Beardon and Stephenon and He and Schramm).  Then Kelner randomly subdivides $G$ repeatedly, and these subdivisions give progressively better approximations to some sequence of surfaces $\{\mathbb S_i\}$.  Once the approximation is of high enough quality, one applies Riemann-Roch to $\mathbb S_k$, and infers something about a subdivision of $G$.  The final element is to track how the second eigenvalue of $G$ changes (in expectation) under random subdivision.

Needless to say, this approach is already quite delicate, and for graphs that can’t be equipped with some kind of conformal structure, we seem to have reached a dead end.  In this lecture, we’ll see how to use intrinsic deformations of the geometry of $G$ in order to bound its eigenvalues.  Eventually, this will reduce to the study of certain kinds of multi-commodity flows.

### Metrics on graphs

Let $G=(V,E)$ be an arbitrary n-vertex graph with maximum degree $d_{\max}$.  Recall that we can write

$\displaystyle \lambda_2 = \min_{f \neq 0 : \sum_{x \in V} f(x)=0} \frac{\sum_{xy \in E} |f(x)-f(y)|^2}{\sum_{x \in V} f(x)^2}.$

where $f : V \to \mathbb R$.  (Also recall that we can replace $\mathbb R$ by any Hilbert space, and the same formula holds.)  The first step is to prepare this equality for “non-linearization” by getting rid of the linear condition $\sum_{x \in V} f(x)=0$ and the sum $\sum_{x \in V} f(x)^2$.  (This is a popular sort of passage in the non-linear geometry of Banach spaces, which also plays a rather important role in applications to the theoretical CS.)  The goal is to get only terms that look like $|f(x)-f(y)|$.  Fortunately, there is a well-known way to do this:

$\displaystyle \lambda_2 = 2 n \cdot \min_{f : V \to \mathbb R} \frac{\sum_{xy \in E} |f(x)-f(y)|^2}{\sum_{x,y \in V} |f(x)-f(y)|^2},$

which follows easily from the equality $\sum_{x,y \in V} |f(x)-f(y)|^2 = 2n \sum_{x \in V} f(x)^2$ when $\sum_{x \in V} f(x)=0$.

Thus if we want to bound $\lambda_2 = O(1/n)$, we need to find an $f : V \to \mathbb R$ for which the latter ratio (without the $2n$) is $O(1/n^2)$.  Now, for someone who works a lot with linear programming relaxations, it’s very natural to consider a “relaxation”

$\displaystyle \gamma(G) = \min_{d} \frac{\sum_{xy \in E} d(x,y)^2}{\sum_{x,y \in V} d(x,y)^2},$

where the minimization is over all pseudo-metrics d, i.e. symmetric non-negative functions $d : V \times V \to \mathbb R$ which satisfy the triangle inequality, but might have $d(x,y)=0$ even for $x \neq y$.  Certainly $\gamma(G) \leq \lambda_2/2n$, but Bourgain’s embedding theorem (which states that every n-point metric space embeds into a Hilbert space with distortion at most $O(\log n)$) also assures us that $\lambda_2(G) \leq O(n \log^2 n) \gamma(G)$.  Since we are trying to show that $\gamma(G) = O(1/n^2)$, this $O(\log^2 n)$ term is morally negligible.  One can see the paper for a more advanced embedding argument that doesn’t lose this factor, but for now we concentrate on proving that $\gamma(G) = O(1/n^2)$.  The embedding theorems allow us to concentrate on finding an intrinsic metric on the graph with small “Rayleigh quotient,” without having to worry about an eventual geometric representation.

As a brief preview… we are going to find a good metric $d$ by taking a certain kind of all-pairs multi-commodity flow at optimality, and weighting the edges by their congestion in the optimal flow.  Thus as the flow spreads out on the graph, it has the effect of “uniformizing” its geometry.

### Discrete Riemannian metrics, convexification, and duality

Let’s now assume that $G$ is planar.  We want to show that $\gamma(G) = O(d_{\max}/n^2)$.  First, let’s restrict ourselves to vertex weighted metrics on $G$.  Given any non-negative weight function $\omega : V \to \mathbb R$, we can define the length of a path $P$ in $G$ by summing the weights of vertices along it:  $\mathsf{len}_{\omega}(P) = \sum_{x \in P} \omega(x)$.  Then we can define a vertex-weighted shortest-path pseudo-metric on $G$ in the natural way

$\displaystyle \mathsf{dist}_{\omega}(x,y) = \min \left\{ \mathsf{len}_{\omega}(P) : P \in \mathcal P_{xy}\right\},$

where $\mathcal P_{uv}$ is the set of all u-v paths in $G$.  We also have the nice relationship

$\displaystyle \sum_{xy \in E} \mathsf{dist}_{\omega}(x,y)^2 \leq 2 d_{\max} \sum_{x \in V} \omega(x)^2,\qquad(1)$

since $\mathsf{dist}_{\omega}(x,y) \leq [\omega(x)+\omega(y)]^2$.

So if we define

$\displaystyle \Lambda_0(\omega) = \frac{\displaystyle \sum_{x \in V} \omega(x)^2}{\displaystyle \sum_{x,y \in V} \mathsf{dist}_{\omega}(x,y)^2}$

then by (1), we have $\gamma(G) \leq 2 d_{\max} \min_{\omega} \Lambda_0(\omega)$.

Examples. Let’s try to exhibit weights $\omega$ for two well-known examples:  the grid, and the complete binary tree.

For the $\sqrt{n} \times \sqrt{n}$ grid, we can simply take $\omega(x)=1$ for all $x \in V$.  Clearly $\sum_{x \in V} \omega(x)^2 = n$.  On the other hand, a random pair of points in the grid is $\Omega(\sqrt{n})$ apart, hence $\sum_{x,y \in V} \mathsf{dist}_{\omega}(x,y)^2 \approx n^2 \cdot (\sqrt{n})^2 = n^3$.  It follows that $\Lambda_0(\omega) = O(1/n^2)$, as desired.

For the complete binary tree with root $r$, we can simply put $\omega(r)=1$ and $\omega(x)=0$ for $x \neq r$.  (Astute readers will guess the geometrically decreasing weights are actually the optimal choice.)  In this case, $\sum_{x \in V} \omega(x)^2 = 1$, while all the pairs $x,y$ on opposite sides of the root have $\mathsf{dist}_{\omega}(x,y)=1$.  It again follows that $\Lambda_0(\omega) = O(1/n^2)$.  Our goal is to provide such a weight $\omega$ for any planar graph.