Central Limit Theorem

This is one of the most fundamental theorems concerning random variables. However, many people misunderstand and misuse it.

Theorems

Let $X_n$ be a sequence of i.i.d. with mean $\mu$ and variance $\sigma^2$. Let $ M_{X, n}:=\frac{X_1+\cdots +X_n}{n}$ be the sample mean. Then

\begin{equation}\label{eq:CTLv1}

\frac{\sqrt{n}(M_{X, n}-\mu)}{\sigma}\text{ converges in distribution to } \mathcal{N}(0,1).

\end{equation}

In big data context (e.g. e-commerce) we usually use the Central Limit Theorem to estimate the theoretical mean $\mu$ (a.k.a population mean) from samples $X_1, \cdots, X_n$. The formula above has one thing impractical, that is $\sigma$, which is a theoretical number that we don't know. Hence, we use a more practical version.

A practical version of the Central Limit Theorem

\begin{equation}\label{eq:CLTv2}\frac{\sqrt{n}(M_{X, n}-\mu)}{\sigma_{X,n}}\to \mathcal{N}(0,1)\end{equation}

where $\sigma_{X,n}$ is the sample variance of the sequence $X_1,\cdots, X_n$. This version of CLT is harder to prove (a hint can be found here, it is a great exercise for math students).

Essentially, the theorem says we could use the standard normal random variable $\mathcal{N}(0,1)$ to estimate probabilities concerning $ \frac{\sqrt{n}(M_{X, n}-\mu)}{\sigma_{X,n}}$. That means, for any interval $I$:

\begin{equation}\label{eq:CLTv3}

\mathbb{P}(\frac{\sqrt{n}(M_{X, n}-\mu)}{\sigma_{X,n}}\in I)\approx \mathbb{P}(\mathcal{N}(0,1)\in I)

\end{equation}

Usage 1: where could $\mu$ be?

The Central Limit Theorem is used to estimate the whereabouts of $\mu$ once $M_{X,n}$ and $ \sigma_{X,n} $ are computed. The whereabouts are presented in the form of a so-called confidence interval.

Suppose that we already have the values of $X_1, \cdots, X_n$. Then we could compute $M_{X,n}$ and $\sigma_{X,n}$. First, from the Z table,

\begin{align*}\mathbb{P}(|\mathcal{N}(0,1)|\leq 1.96) &= &\mathbb{P}(\mathcal{N}\leq 1.96)-\mathbb{P}(\mathcal{N}\leq -1.96)\\ &=&\Phi(1.96)-\Phi(-1.96)\\&\approx&0.975-0.025\\&=&0.95.\end{align*}

Then the (variant) Central Limit Theorem \eqref{eq:CLTv3} implies that

\begin{equation}

Theorems

A practical version of the Central Limit Theorem

Usage 1: where could \(\mu\) be?