In order to prove that the chaining argument is tight, we will need some additional properties of Gaussian processes. For the chaining upper bound, we used a series of union bounds specified by a tree structure. As a first step in producing a good lower bound, we will look at a way in which the union bound is tight.
Theorem 1 (Sudakov inequality) For some constant
, the following holds. Let
be a Gaussian process such that for every distinct
, we have
. Then,
The claim is an elementary calculation for a sequence of i.i.d. random variables
(i.e.
). We will reduce the general case to this one using Slepian’s comparison lemma.
Lemma 2 (Slepian’s Lemma) Let
and
be two Gaussian processes such that for all
,
Then
.
There is a fairly elementary proof of Slepian’s Lemma (see, e.g. the Ledoux-Talagrand book), if one is satisfied with the weaker conclusion , which suffices for our purposes.
To see that Lemma 2 yields Theorem 1, take a family with
for all
and consider the associated variables
where
is a family of i.i.d.
random variables. It is straightforward to verify that (1) holds, hence by the lemma,
, and the result follows from the i.i.d. case.
The Sudakov inequality gives us “one level” of a lower bound; the following strengthening will allow us to use it recursively. If we have a Gaussian process and
, we will use the notation
For and
, we also use the notation
Here is the main theorem of this post; its statement is all we will require for our proof of the majorizing measures theorem:
Theorem 3 For some constants
and
, the following holds. Suppose
is a Gaussian process, and let
be such that
for
. Then,
The proof of the preceding theorem relies on the a strong concentration property for Gaussian processes. First, we recall the classical isoperimetric inequality for Gaussian space (see, for instance, (2.9) here).
We remind the reader that for a function ,
Theorem 4 Let
, and let
, where
is the standard
-dimensional Gaussian measure. Then,
Using this, we can prove the following remarkable fact.
Theorem 5 Let
be a Gaussian process, then
A notable aspect of this statement is that only the maximum variance affects the concentration, not the number of random variables. We now prove Theorem 5 using Theorem 4.
Proof: We will prove it in the case , but of course our bound is independent of
. The idea is that given a Gaussian process
, we can write
for , where
are standard i.i.d. normals, and the matrix
is a matrix of real coefficients. In this case, if
is a standard
-dimensional Gaussian, then the vector
is distributed as
.
If we put , then Theorem 4 yields (3) as long as
. It is easy to see that
But is just the maximum
norm of any row of
, and the
norm of row
is
Using this theorem, we are ready to prove Theorem 3. I will only give a sketch here, but filling in the details is not too difficult.
Assume that the conditions of Theorem 3 hold. Pick an arbitrary , and recall that we can write
since our gaussians are centered.
Now, by Theorem 1,
Suppose that achieves this. By definition,
so we could hope that for some , we simultaneously have
, yielding
The problem, of course, is that the events we are discussing are not independent.
This is where Theorem 5 comes in. For any , all the variances of the variables
are bounded by
. This implies that we can choose a constant
such that
So, in fact, we can expect that none of the random variables
will deviate from its expected value by more than
. Which means we can (morally) replace (4) by
But now by choosing , the error term is absorbed.
One thought on “Majorizing measures: Gaussian tools”