Cantelli's inequality

In probability theory, Cantelli's inequality, named after Francesco Paolo Cantelli, is a generalization of Chebyshev's inequality in the case of a single "tail".[1][2][3] The inequality states that


\Pr(X-\mu\ge\lambda)\quad\begin{cases}
\le \frac{\sigma^2}{\sigma^2 + \lambda^2} & \text{if } \lambda > 0, \\[8pt]
\ge 1 - \frac{\sigma^2}{\sigma^2 + \lambda^2} & \text{if }\lambda < 0.
\end{cases}

where

X is a real-valued random variable,
\Pr is the probability measure,
\mu is the expected value of X,
\sigma^2 is the variance of X.

Combining the cases of \lambda >0 and \lambda < 0 gives, for \delta >0,


\Pr(| X-\mu|\ge\delta)\le \frac{2\sigma^2}{\sigma^2+\delta^2}.

The inequality is due to Francesco Paolo Cantelli. The Chebyshev inequality implies that in any data sample or probability distribution, "nearly all" values are close to the mean in terms of the absolute value of the difference between the points of the data sample and the weighted average of the data sample. The Cantelli inequality (sometimes called the "Chebyshev–Cantelli inequality" or the "one-sided Chebyshev inequality") gives a way of estimating how the points of the data sample are bigger than or smaller than their weighted average without the two tails of the absolute value estimate. The Chebyshev inequality has "higher moments versions" and "vector versions", and so does the Cantelli inequality.

Proof

Let X be a real-valued random variable with finite variance \sigma^2 and expectation \mu, and define Y = X - \mu (so that \mathbb{E}[Y] = 0 and \operatorname{Var}(Y) = \sigma^2).

Then, for any u\geq 0, we have


\Pr( X-\mu \geq\lambda)
= \Pr( Y  \geq \lambda) 
= \Pr( Y + u  \geq \lambda + u)
\leq  \Pr( (Y + u)^2  \geq (\lambda + u)^2 )
\leq \frac{\mathbb{E}[(Y + u)^2] }{(\lambda + u)^2}
= \frac{\sigma^2 + u^2 }{(\lambda + u)^2}.

the last inequality being a consequence of Markov's inequality. As the above holds for any choice of u\in\mathbb{R}, we can choose to apply it with the value that minimizes the function u \geq 0 \mapsto \frac{\sigma^2 + u^2 }{(\lambda + u)^2}. By differentiating, this can be seen to be u_\ast = \frac{\sigma^2}{\lambda}, leading to


\Pr( X-\mu \geq\lambda) \leq \frac{\sigma^2 + u_\ast^2 }{(\lambda + u_\ast)^2} = \frac{\sigma^2}{\lambda^2 + \sigma^2}.

\Pr( X-\mu < \lambda)
= \Pr( -Y  > \alpha) \leq  \frac{\sigma^2}{\alpha^2 + \sigma^2} = \frac{\sigma^2}{\lambda^2 + \sigma^2}

using the previous derivation on -Y. By taking the complement, we obtain


\Pr( X-\mu \geq \lambda) \geq 1-  \frac{\sigma^2}{\lambda^2 + \sigma^2}.

References

  1. Research and practice in multiple criteria decision making: proceedings of the XIVth International Conference on Multiple Criteria Decision Making (MCDM), Charlottesville, Virginia, USA, June 8–12, 1998, edited by Y.Y. Haimes and R.E. Steuer, Springer, 2000, ISBN 3540672664.
  2. "Tail and Concentration Inequalities" by Hung Q. Ngo
  3. "Concentration-of-measure inequalities" by Gábor Lugosi


This article is issued from Wikipedia - version of the 4/26/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.