In order to prove that the chaining argument is tight, we will need some additional properties of Gaussian processes. For the chaining upper bound, we used a series of union bounds specified by a tree structure. As a first step in producing a good lower bound, we will look at a way in which the union bound is tight.

Theorem 1 (Sudakov inequality)For some constant , the following holds. Let be a Gaussian process such that for every distinct , we have . Then,

The claim is an elementary calculation for a sequence of i.i.d. random variables (i.e. ). We will reduce the general case to this one using Slepian’s comparison lemma.

Lemma 2 (Slepian’s Lemma)Let and be two Gaussian processes such that for all ,

There is a fairly elementary proof of Slepian’s Lemma (see, e.g. the Ledoux-Talagrand book), if one is satisfied with the weaker conclusion , which suffices for our purposes.

To see that Lemma 2 yields Theorem 1, take a family with for all and consider the associated variables where is a family of i.i.d. random variables. It is straightforward to verify that (1) holds, hence by the lemma, , and the result follows from the i.i.d. case.

The Sudakov inequality gives us “one level” of a lower bound; the following strengthening will allow us to use it recursively. If we have a Gaussian process and , we will use the notation

For and , we also use the notation

Here is the main theorem of this post; its statement is all we will require for our proof of the majorizing measures theorem:

Theorem 3For some constants and , the following holds. Suppose is a Gaussian process, and let be such that for . Then,

The proof of the preceding theorem relies on the a strong concentration property for Gaussian processes. First, we recall the classical isoperimetric inequality for Gaussian space (see, for instance, (2.9) here).

We remind the reader that for a function ,

Theorem 4Let , and let , where is the standard -dimensional Gaussian measure. Then,

Using this, we can prove the following remarkable fact.

Theorem 5Let be a Gaussian process, then

A notable aspect of this statement is that only the maximum variance affects the concentration, not the *number* of random variables. We now prove Theorem 5 using Theorem 4.

*Proof:* We will prove it in the case , but of course our bound is independent of . The idea is that given a Gaussian process , we can write

for , where are standard i.i.d. normals, and the matrix is a matrix of real coefficients. In this case, if is a standard -dimensional Gaussian, then the vector is distributed as .

If we put , then Theorem 4 yields (3) as long as . It is easy to see that

But is just the maximum norm of any row of , and the norm of row is

Using this theorem, we are ready to prove Theorem 3. I will only give a sketch here, but filling in the details is not too difficult.

Assume that the conditions of Theorem 3 hold. Pick an arbitrary , and recall that we can write

since our gaussians are centered.

Now, by Theorem 1,

Suppose that achieves this. By definition,

so we could hope that for some , we simultaneously have , yielding

The problem, of course, is that the events we are discussing are not independent.

This is where Theorem 5 comes in. For any , all the variances of the variables are bounded by . This implies that we can choose a constant such that

So, in fact, we can expect that none of the random variables will deviate from its expected value by more than . Which means we can (morally) replace (4) by

But now by choosing , the error term is absorbed.