Uniform convergence (combinatorics)

For a class of predicates defined on a set and a set of samples , where , the empirical frequency of on is . The Uniform Convergence Theorem states, roughly,that if is "simple" and we draw samples independently (with replacement) from according to a distribution , then with high probability all the empirical frequency will be close to its expectation, where the expectation is given by . Here "simple" means that the Vapnik-Chernovenkis dimension of the class is small relative to the size of the sample.
In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole.

Uniform convergence theorem statement[1]

If is a set of -valued functions defined on a set and is a probability distribution on then for and a positive integer, we have,

for some

where, for any ,

,
and . indicates that the probability is taken over consisting of i.i.d. draws from the distribution .

is defined as: For any -valued functions over and ,

.

And for any natural number the shattering number is defined as.

.

From the point of Learning Theory one can consider to be the Concept/Hypothesis class defined over the instance set . Before getting into the details of the proof of the theorem we will state Sauer's Lemma which we will need in our proof.

Sauer–Shelah lemma

The Sauer–Shelah lemma[2] relates the shattering number to the VC Dimension.

Lemma: , where is the VC Dimension of the concept class .

Corollary: .

Proof of uniform convergence theorem [1]

Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof.

  1. Symmetrization: We transform the problem of analyzing into the problem of analyzing , where and are i.i.d samples of size drawn according to the distribution . One can view as the original randomly drawn sample of length , while may be thought as the testing sample which is used to estimate .
  2. Permutation: Since and are picked identically and independently, so swapping elements between them will not change the probability distribution on and . So, we will try to bound the probability of for some by considering the effect of a specific collection of permutations of the joint sample . Specifically, we consider permutations which swap and in some subset of . The symbol means the concatenation of and .
  3. Reduction to a finite class: We can now restrict the function class to a fixed joint sample and hence, if has finite VC Dimension, it reduces to the problem to one involving a finite function class.

We present the technical details of the proof.

Symmetrization

Lemma: Let for some and

for some .

Then for , .

Proof: By the triangle inequality,
if and then .
Therefore,

and
and [since and are independent].

Now for fix an such that . For this , we shall show that

.

Thus for any , and hence . And hence we perform the first step of our high level idea.
Notice, is a binomial random variable with expectation and variance . By Chebyshev's inequality we get,

for the mentioned bound on . Here we use the fact that for .

Permutations

Let be the set of all permutations of that swaps and in some subset of .

Lemma: Let be any subset of and any probability distribution on . Then,

where the expectation is over chosen according to , and the probability is over chosen uniformly from .

Proof: For any

[since coordinate permutations preserve the product distribution ].
[Because is finite]
[The expectation]
.

The maximum is guaranteed to exist since there is only a finite set of values that probability under a random permutation can take.

Reduction to a finite class

Lemma: Basing on the previous lemma,

.

Proof: Let us define and which is atmost . This means there are functions such that for any between and with for .
We see that iff for some in satisfies, . Hence if we define if and otherwise.
For and , we have that iff for some in satisfies . By union bound we get,

.

Since, the distribution over the permutations is uniform for each , so equals , with equal probability.
Thus,

,

where the probability on the right is over and both the possibilities are equally likely. By Hoeffding's inequality, this is at most .

Finally, combining all the three parts of the proof we get the Uniform Convergence Theorem.

References

This article is issued from Wikipedia - version of the 11/30/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.