This is one of the most fundamental theorems concerning random variables. However, many people misunderstand and misuse it.

Theorems

Let $X_n$ be a sequence of i.i.d. with mean \(\mu\) and variance \(\sigma^2\). Let \( M_{X, n}:=\frac{X_1+\cdots +X_n}{n}\) be the sample mean. Then

\begin{equation}\label{eq:CTLv1}

\frac{\sqrt{n}(M_{X, n}-\mu)}{\sigma}\text{ converges in distribution to } \mathcal{N}(0,1).

\end{equation}

In big data context (e.g. e-commerce) we usually use the Central Limit Theorem to estimate the theoretical mean \(\mu\) (a.k.a population mean) from samples \(X_1, \cdots, X_n\). The formula above has one thing impractical, that is \(\sigma\), which is a theoretical number that we don't know. Hence, we use a more practical version.

A practical version of the Central Limit Theorem

\begin{equation}\label{eq:CLTv2}\frac{\sqrt{n}(M_{X, n}-\mu)}{\sigma_{X,n}}\to \mathcal{N}(0,1)\end{equation}

where \(\sigma_{X,n}\) is the sample variance of the sequence \(X_1,\cdots, X_n\). This version of CLT is harder to prove (a hint can be found here, it is a great exercise for math students).

Essentially, the theorem says we could use the standard normal random variable \(\mathcal{N}(0,1)\) to estimate probabilities concerning \( \frac{\sqrt{n}(M_{X, n}-\mu)}{\sigma_{X,n}}\). That means, for any interval \(I\):

\begin{equation}\label{eq:CLTv3}

\mathbb{P}(\frac{\sqrt{n}(M_{X, n}-\mu)}{\sigma_{X,n}}\in I)\approx \mathbb{P}(\mathcal{N}(0,1)\in I)

\end{equation}

Usage 1: where could \(\mu\) be?

The Central Limit Theorem is used to estimate the whereabouts of \(\mu\) once \(M_{X,n}\) and \( \sigma_{X,n} \) are computed. The whereabouts are presented in the form of a so-called confidence interval.

Suppose that we already have the values of \(X_1, \cdots, X_n\). Then we could compute \(M_{X,n}\) and \(\sigma_{X,n}\). First, from the Z table,

\begin{align*}\mathbb{P}(|\mathcal{N}(0,1)|\leq 1.96) &= &\mathbb{P}(\mathcal{N}\leq 1.96)-\mathbb{P}(\mathcal{N}\leq -1.96)\\ &=&\Phi(1.96)-\Phi(-1.96)\\&\approx&0.975-0.025\\&=&0.95.\end{align*}

Then the (variant) Central Limit Theorem \eqref{eq:CLTv3} implies that

\begin{equation}