mineti.dev

The Feynman experience · Part 1

The law of large numbers

evergreen · Jan 4, 2019

Introduction

Everyone who studies probability runs into at least two limit theorems: the law of large numbers and the central limit theorem. The law of large numbers describes the result of performing the same experiment a large number of times. It is pervasive in probability and shows up everywhere from gambling to risk management.

Examples

Let’s simulate the two usual suspects of any probability example: a coin and a die.

Coin toss

Let’s track the frequency of heads over 1,000 coin tosses. As throughout this series, we work from scratch in plain Python with NumPy:

import numpy as np

np.random.seed(51)

# 1,000 coin tosses, each 0 (tails) or 1 (heads)
coin = np.random.randint(0, 2, 1000)

# Running frequency of heads after each toss
heads_mean = [np.mean(coin[:i + 1]) for i in range(len(coin))]
Running frequency of heads over 1,000 coin tosses: it swings widely at first, then settles onto the expected 0.5.
Over 1,000 tosses the observed frequency of heads settles onto the expected 0.5 (dashed). The early swings are large; they shrink as trials accumulate.

The frequency lurches around early on, then closes in on the expected 0.5 as the tosses pile up.

Throwing dice

The same idea with a die — the frequency of sixes should approach 1/61/6:

np.random.seed(51)

# 1,000 dice throws, each 1 to 6
dice = np.random.randint(1, 7, 1000)

# 1 when the throw is a six, 0 otherwise
six = [1 if x == 6 else 0 for x in dice]

# Running frequency of sixes
six_mean = [np.mean(six[:i + 1]) for i in range(len(six))]
Running frequency of sixes over 1,000 dice throws, settling onto the expected one sixth.
The observed frequency of sixes over 1,000 throws settles onto the expected 1/6 (dashed).

A common confusion

Sometimes you will hear that the law of large numbers means the counts of heads and tails grow closer together as the number of trials increases. That is not what it says — and we can show it. Here is the running difference between the number of heads and the number of tails:

np.random.seed(51)
coin = np.random.randint(0, 2, 1000)

# Cumulative (number of heads) minus (number of tails)
diff = [2 * np.sum(coin[:i + 1]) - (i + 1) for i in range(len(coin))]
The running difference between the number of heads and tails over 1,000 tosses, wandering up and down without settling.
The difference between the number of heads and tails wanders rather than converging. The average converges; the raw counts need not.

As far as we can tell, this series is not converging to any particular value — it just wanders. A related mistake is to imagine the coin has some kind of memory, so that a long run of tails makes heads more likely; that is the gambler’s fallacy.

Mathematical treatment

Convergence in probability

A sequence of random variables X1,X2,X_1, X_2, \dots converges in probability to bb if, for every ϵ>0\epsilon > 0,1

limnP(Xnb<ϵ)=1\lim_{n \to \infty} P(|X_n - b| < \epsilon) = 1

also written

XnpbX_n \overset{p}{\to} b

The law of large numbers states that the sample average converges in probability to the expected value:2

Xˉnpμ\bar{X}_n \overset{p}{\to} \mu

as nn \to \infty. This is the weak law of large numbers. Convergence in probability is just one of the ways to define the convergence of a sequence of random variables.3 The strong law of large numbers makes a stronger claim — the sample average converges almost surely to the expected value2 — and its proof is beyond our scope here; Terence Tao has a nice exposition on his blog.4

Proof of the weak law

Assume every XiX_i has variance V(Xi)=σ2V(X_i) = \sigma^2. Since the samples are i.i.d.,

V(Xˉn)=V ⁣(1n(X1+X2++Xn))=1n2V(X1+X2++Xn)=nσ2n2=σ2n\begin{aligned} V(\bar{X}_n) &= V\!\left(\tfrac{1}{n}(X_1 + X_2 + \dots + X_n)\right) \\ &= \frac{1}{n^2} V(X_1 + X_2 + \dots + X_n) \\ &= \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n} \end{aligned}

and we know that

E(Xˉn)=μE(\bar{X}_n) = \mu

Applying Chebyshev’s inequality to Xˉn\bar{X}_n,

P(Xˉnμϵ)σ2nϵ2P(|\bar{X}_n - \mu| \geq \epsilon) \leq \frac{\sigma^2}{n\epsilon^2}

Taking the complement,

P(Xˉnμ<ϵ)=1P(Xˉnμϵ)=1σ2nϵ2\begin{aligned} P(|\bar{X}_n - \mu| < \epsilon) &= 1 - P(|\bar{X}_n - \mu| \geq \epsilon) \\ &= 1 - \frac{\sigma^2}{n\epsilon^2} \end{aligned}

and letting nn \to \infty drives the right-hand side to 1, which gives

Xˉnpμ\bar{X}_n \overset{p}{\to} \mu

References

  1. DeGroot, M. H. & Schervish, M. J. Probability and Statistics. (Pearson Education, 2012).

  2. Loève, M. Probability Theory. (Springer, 1977). 2

  3. Convergence of random variables. Wikipedia. https://en.wikipedia.org/wiki/Convergence_of_random_variables

  4. Tao, T. The strong law of large numbers. What’s new (2008). https://terrytao.wordpress.com/2008/06/18/the-strong-law-of-large-numbers/

← All articles