# The need for non-linear mappings

In the last post, I recalled the problem of dimension reduction for finite subsets of $L_1$.  I thought I should mention the main obstacle to reducing the dimension below $O(n)$ for $n$-point subsets:  It can’t be done with linear mappings.

All the general results mentioned in that post use a linear mapping.  In fact, they are all of the form:

1. Change of density, i.e. preprocess the points/subspace so that no point has too much weight on any one coordinate.
2. Choose a subset of the coordinates, possibly multiplying the chosen coordinates by non-negative weights.  (Note that the Newman-Rabinovich result, based on Batson, Spielman, and Srivastava, is deterministic, while in the other bounds, the sampling is random.)

(The dimension reduction here is non-linear, but only applies to special subsets of $L_1$, like the Brinkman-Charikar point set.)

The next theorem shows that linear dimension reduction mappings cannot do better than $O(n)$ dimensions.

Theorem: For every $1 \leq p \leq \infty$, there are arbitrarily large $n$-point subsets of $L_p$ on which any linear embedding into $L_2$ incurs distortion at least $\left(\frac{n-1}{2}\right)^{|1/p-1/2|}.$

Since the identity map from $\ell_1^n$ to $\ell_2^n$ has distortion $\sqrt{n}$, this theorem immediately implies that there are $n$-point subsets on which any linear embedding requires $\Omega(n)$ dimension for an ${O(1)}$-distortion embedding.  The $p=1$ case of the preceding theorem was proved by Charikar and Sahai.  A simpler proof, which extends to all $p \geq 1$ is given in Lemma 3.1 of a paper by myself, Mendel, and Naor.

# Open problem: Dimension reduction in L_1

Since I’ve been interacting a lot with the theory group at MSR Redmond (see the UW-MSR Theory Center), I’ve been asked occasionally to propose problems in the geometry of finite metric spaces that might be amenable to probabilistic tools. Here’s a fundamental problem that’s wide open. Let ${c_D(n)}$ be the smallest number such that every ${n}$-point subset of ${L_1}$ embeds into ${\ell_1^{c_D(n)}}$ with distortion at most ${D}$. Here’s what’s known.

1. Talagrand (following work of Bourgain-Lindenstrauss-Milman and Schechtman) proved that for every ${\varepsilon > 0}$, every ${n}$-dimensional subspace of ${L_1}$ admits a ${(1+\varepsilon)}$-distortion embedding into ${\ell_1^{d}}$ with ${d = O((n \log n)/\varepsilon^2)}$. In particular, this gives

$\displaystyle c_{1+\varepsilon}(n) = O((n \log n)/\varepsilon^2).$

2. Brinkman and Charikar showed that ${c_D(n) \geq \Omega(n^{1/D^2})}$ for ${D \geq 1}$. A significantly simpler proof was later given by Assaf Naor and myself. (With Brinkman and Karagiozova, we have also shown that this bound is tight for the Brinkman-Charikar examples and their generalizations.)
3. Recently, Newman and Rabinovich showed that one can take ${c_{1+\varepsilon}(n) = O(n/\varepsilon^2)}$ for any ${\varepsilon > 0}$. Their paper relies heavily on the beautiful spectral sparsification method of Batson, Spielman, and Srivastava. In fact, it is shown that one can use only ${O(n/\varepsilon^2)}$ weighted cuts (see the paper for details). This also hints at a limitation of their technique, since it is easy to see that the metric on ${\{1,2,\ldots,n\} \subseteq \mathbb R}$ requires ${\Omega(n)}$ cuts for a constant distortion embedding (and obviously only one dimension).

The open problem is to get better bounds. For instance, we only know that

$\displaystyle \Omega(n^{1/100}) \leq c_{10}(n) \leq O(n).$

There is evidence that $n^{\Theta(1/D^2)}$ might be the right order of magnitude.  In the large distortion regime, when $D = \Omega(\sqrt{\log n} \log \log n)$, results of Arora, myself, and Naor show that $c_D(n) = O(\log n)$.

# Hypercontractivity and its Applications

My student Punya Biswal just completed this great survey on hypercontractivity and its application in computer science. There is a PDF version from his home page, and accompanying slides.

Hypercontractive inequalities are a useful tool in dealing with extremal questions in the geometry of high-dimensional discrete and continuous spaces. In this survey we trace a few connections between different manifestations of hypercontractivity, and also present some relatively recent applications of these techniques in computer science.

### 1. Preliminaries and notation

Fourier analysis on the hypercube. We define the inner product ${\langle f,g \rangle = \mathop{\mathbb E}_{x}f(x)g(x)}$ on functions ${f,g \colon \{-1,1\}^{n} \rightarrow {\mathbb R}}$, where the expectation is taken over the uniform (counting) measure on ${\{-1,1\}^n}$. The multilinear polynomials ${\chi_{S}(x)=\prod_{i\in S}x_{i}}$ (where ${S}$ ranges over subsets of ${[n]}$) form an orthogonal basis under this inner product; they are called the Fourier basis. Thus, for any function ${f \colon \{-1,1\}^{n}\rightarrow{\mathbb R}}$, we have ${f = \sum_{S\subseteq[n]}\hat{f}(S)\chi_{S}(x)}$, where the Fourier coefficients ${\hat{f}(S)=\langle f,\chi_{S}\rangle}$ obey Plancherel’s relation ${\sum\hat{f}(S)^{2}=1}$. It is easy to verify that ${\mathop{\mathbb E}_{x}f(x)=\hat{f}(0)}$ and ${\textsf{Var}_{x}f(x)=\sum_{S\neq\emptyset}\hat{f}(S)^{2}}$.

Norms. For ${1\leq p<\infty}$, define the ${\ell_{p}}$ norm ${\|f\|_{p}=(\mathop{\mathbb E}_{x}|f(x)|^{p})^{1/p}}$. These norms are monotone in ${p}$: for every function ${f}$, ${p\geq q}$ implies ${\|f\|_{p}\geq\|f\|_{q}}$. For a linear operator ${M}$ carrying functions ${f \colon \{-1,1\}^{n}\rightarrow{\mathbb R}}$ to functions ${Mf=g \colon \{-1,1\}^{n}\rightarrow{\mathbb R}}$, we define the ${p}$-to-${q}$ operator norm ${\|M\|_{p\rightarrow q}=\sup_{f}\|Mf\|_{q}/\|f\|_{p}}$. ${M}$ is said to be a contraction from ${\ell_{p}}$ to ${\ell_{q}}$ when ${\|M\|_{p\rightarrow q}\leq1}$. Because of the monotonicity of norms, a contraction from ${\ell_{p}}$ to ${\ell_{p}}$ is automatically a contraction from ${\ell_{p}}$ to ${\ell_{q}}$ for any ${q. When ${q>p}$ and ${\|M\|_{p\rightarrow q}\leq1}$, then ${M}$ is said to be hypercontractive.

Convolution operators. Letting ${xy}$ represent the coordinatewise product of ${x, y \in \{-1,1\}^n}$, we define the convolution ${(f*g)(x)=\mathop{\mathbb E}_{y}f(y)g(xy)}$ of two functions ${f,g \colon \{-1,1\}^{n}\rightarrow{\mathbb R}}$, and note that it is a linear operator ${f\mapsto f*g}$ for every fixed ${g}$. Convolution is commutative and associative, and the Fourier coefficients of a convolution satisfy the useful property ${\widehat{f*g}=\hat{f}\hat{g}}$. We shall be particularly interested in the convolution properties of the following functions

• The Dirac delta ${\delta \colon \{-1,1\}^{n}\rightarrow{\mathbb R}}$, given by ${\delta(1,\dotsc,1)=1}$ and ${\delta(x)=0}$ otherwise. It is the identity for convolution and has ${\hat{\delta}(S)=1}$ for all ${S\subseteq[n]}$.
• The edge functions ${h_{i} \colon \{-1,1\}^{n}\rightarrow{\mathbb R}}$ given by

$\displaystyle h_{i}(x)= \begin{cases} \phantom{-}1/2 & x=(1,\dotsc,1)\\ -1/2 & x_{i}=-1,x_{[n]\setminus\{i\}}=(1,\dotsc,1)\\ \phantom{-}0 & \text{otherwise.} \end{cases}$

${\hat{h}_{i}(S)}$ is ${1}$ or ${0}$ according as ${S}$ contains or does not contain ${i}$, respectively. For any function ${f \colon \{-1,1\}^{n}\rightarrow{\mathbb R}}$, ${(f*h_{i})(x)=(f(x)-f(y))/2}$, where ${y}$ is obtained from ${x}$ by flipping just the ${i}$th bit. Convolution with ${h_{i}}$ acts as an orthogonal projection (as we can easily see in the Fourier domain), so for any functions ${f,g \colon \{-1,1\}^{n}\rightarrow{\mathbb R}}$, we have ${\langle f*h_{i},g\rangle=\langle f,h_{i}*g\rangle=\langle f*h_{i},g*h_{i}\rangle}$

• The Bonami-Gross-Beckner noise functions ${\textsf{BG}_{\rho} \colon \{-1,1\}^{n}\rightarrow{\mathbb R}}$ for ${0\leq\rho\leq1}$, where ${\widehat{\textsf{BG}}_{\rho}(S)=\rho^{|S|}}$ and we define ${0^{0}=1}$. These operators form a semigroup, because ${\textsf{BG}_{\sigma}*\textsf{BG}_{\rho}=\textsf{BG}_{\sigma\rho}}$ and ${\textsf{BG}_{1}=\delta}$. Note that ${\textsf{BG}_{\rho}(x)=\sum_{S}\rho^{|S|}\chi_{S}(x)=\prod_{i}(1+\rho x_{i})}$. We define the noise operator ${T_{\rho}}$ acting on functions on the discrete cube by ${T_{\rho}f=\textsf{BG}_{\rho}*f}$. In combinatorial terms, ${(T_{\rho}f)(x)}$ is the expected value of ${f(y)}$, where ${y}$ is obtained from ${x}$ by independently flipping each bit of ${x}$ with probability ${1-\rho}$.

Lemma 1 ${\frac{d}{d\rho}\textsf{BG}_{\rho}=\frac{1}{\rho}\textsf{BG}_{\rho}*\sum h_{i}}$

Proof: This is easy in the Fourier basis:

$\displaystyle \widehat{\textsf{BG}}_{\rho}' = (\rho^{|S|})' = |S|\rho^{|S|-1} = \sum_{i\in[n]}\hat{h}_{i}\frac{\widehat{\textsf{BG}}_{\rho}}{\rho}.$

$\Box$