# Aram Harrow, Quantum Information, and Dvoretzky’s theorem

I’m delighted to announce that Aram Harrow will be joining us at UW as a visiting professor for the coming academic year. Aram is one of the young leaders in quantum information and computation. During a recent visit, Aram started to explain to me how ideas from asymptotic geometric analysis have started to become important in quantum information.

The capacity of quantum channels

A mixed quantum state on ${\mathbb C^d}$ is represented by a density matrix ${\rho}$, which is ${d \times d}$ Hermitian, positive semi-definite matrix with ${\mathrm{tr}(\rho) = 1}$. The von Neumann entropy of ${\rho}$ is given by

$\displaystyle S(\rho) = - \mathrm{tr}(\rho \log \rho).$

If ${\mathcal M_d}$ is the space of ${d \times d}$ matrices with complex entries, then a quantum channel is a mapping

$\displaystyle \Phi : \mathcal M_d \rightarrow \mathcal M_k$

for some ${d,k \geq 1}$, which sends PSD matrices to PSD matrices and preserves the trace. Finally, the minimal output entropy of a quantum channel is

$\displaystyle S_{\min}(\Phi) = \min_{\rho} S(\Phi(\rho)),$

where ${\rho}$ ranges over all mixed states on ${\mathbb C^d}$. Since the entropy is a concave function, this minimum is achieved at a pure state.

Work of Shor showed that the following conjecture is equivalent to a number of long-standing problems in quantum information theory.

Additivity conjecture for minimal output von Neumann entropy: Is is true that, for every two quantum channels ${\Phi_1}$ and ${\Phi_2}$, we have:

$\displaystyle S_{\min}(\Phi_1 \otimes \Phi_2) = S_{\min}(\Phi_1) + S_{\min}(\Phi_2).$

Observe that we always have ${\leq}$ above, by considering the product state ${\rho_1 \otimes \rho_2}$ where ${\rho_1}$ and ${\rho_2}$ are minimizers for ${\Phi_1}$ and ${\Phi_2}$, respectively. The conjecture is about whether we can get a smaller output entropy by entangling the the inputs to ${\Phi_1 \otimes \Phi_2}$.

Hastings’ counterexample and Dvoretzky’s theorem

Hastings proved that the conjecture is false: By appropriately choosing randomly constructed quantum channels, it is possible to achieve

$\displaystyle S_{\min}(\Phi_1 \otimes \Phi_2) < S_{\min}(\Phi_1) + S_{\min}(\Phi_2).$

Remarkably, it has recently been shown by Aubrun, Szarek, and Werner that this counterexample follows from an appropriate version of Dvoretzky’s theorem on almost-Euclidean subspaces of normed spaces (for a refresher, see this earlier post on pseudorandom subspaces).

Let ${\mathcal M_{k,d}}$ be the space of ${k \times d}$ matrices with complex entries. We will consider two norms on matrices ${M \in \mathcal M_{k,d}}$: The Hilbert-Schmidt norm ${\|M\|_2 = \sqrt{\mathrm{tr} |M|^2} = \sqrt{\sum_{i,j} M_{ij}^2}}$, and the Schatten ${4}$-norm ${\|M\|_4 = \left(\mathrm{tr} |M|^4\right)^{1/4}}$, where ${|M| = \sqrt{M^* M}}$. This latter quantity is precisely the ${\ell^4}$ norm of the singular values of ${M}$. Dvoretzky’s theorem tells us (qualitatively) that an appropriate random subspace of the normed space ${(\mathcal M_{k,d}, \|\cdot\|_4)}$ will be nearly Euclidean.

More precisely, for every ${k \geq 1}$, a uniformly random subspace ${\mathcal W \subseteq \mathcal M_{k,O(k^2)}}$ of dimension ${\Omega(k^2)}$ will, with high probability, satisfy: For every ${M \in \mathcal W}$,

$\displaystyle k^{-1/4} \|M\|_2 \leq \|M\|_4 \leq \left(1+\frac{O(1)}{k}\right) k^{-1/4} \|M\|_2,$

i.e. it will be ${\left(1+O(1/k)\right)}$-Euclidean. This strong quantitative version follows along the lines of enhancements to Milman’s proof of Dvoretzky’s theorem.

Oscillations and chaining

The proof of the preceding quantitative bound is based on a version of Dvoretzky’s theorem for Lipschitz functions, which I will state here in the real case for simplicity. We will use ${S^{n-1} \subseteq \mathbb R^n}$ for the ${(n-1)}$-dimensional unit sphere, and then for a function ${f : S^{n-1} \rightarrow \mathbb R}$ and a subset ${A \subseteq S^{n-1}}$, we write

$\displaystyle \mathrm{osc}(f,A) = \sup_{x \in A} |f(x)-\mu(f)|,$

where ${\mu(f)}$ is the average value of ${f}$ on ${S^{n-1}}$.

Theorem 1 If ${f : S^{n-1} \rightarrow \mathbb R}$ is a 1-Lipschitz function, then for every ${\varepsilon > 0}$, if ${E \subseteq \mathbb R^n}$ is a random subspace of dimension ${\lfloor \varepsilon^2 n\rfloor}$, then with high probability,

$\displaystyle \mathrm{osc}(f,E \cap S^{n-1}) \leq O(\varepsilon). \ \ \ \ \ (1)$

A standard way to prove this lemma would be to use the fact that every ${1}$-Lipschitz function is highly concentrated on ${S^{n-1}}$ (Levy’s lemma) and then to take a union bound over an ${\varepsilon}$-net on ${S^{n-1}}$ whose size is bounded by ${(1+2/\varepsilon)^n}$. Unfortunately, this leads to an extra ${\log(1/\varepsilon)}$ term on the right-hand side of (1) which, remarkably, would lead to a version of Dvoretzky’s theorem that is too weak to imply Hastings’ counterexample to the additivity conjecture.

Gordon originally proved Theorem 1 using comparison results for Gaussian processes. (In fact, we will see a main technical step called Slepian’s Lemma in our next post on majorizing measures.) Alternately, Schechtman showed that one can use concentration of measure and a more sophisticated chaining argument, of the type discussed in our previous post.