The Feynman experience · Part 1

The law of large numbers

evergreen · Jan 4, 2019

Introduction

Everyone who studies probability runs into at least two limit theorems: the law of large numbers and the central limit theorem. The law of large numbers describes the result of performing the same experiment a large number of times. It is pervasive in probability and shows up everywhere from gambling to risk management.

Examples

Let’s simulate the two usual suspects of any probability example: a coin and a die.

Coin toss

Let’s track the frequency of heads over 1,000 coin tosses. As throughout this series, we work from scratch in plain Python with NumPy:

import numpy as np

np.random.seed(51)

# 1,000 coin tosses, each 0 (tails) or 1 (heads)
coin = np.random.randint(0, 2, 1000)

# Running frequency of heads after each toss
heads_mean = [np.mean(coin[:i + 1]) for i in range(len(coin))]

Running frequency of heads over 1,000 coin tosses: it swings widely at first, then settles onto the expected 0.5. — Over 1,000 tosses the observed frequency of heads settles onto the expected 0.5 (dashed). The early swings are large; they shrink as trials accumulate.

The frequency lurches around early on, then closes in on the expected 0.5 as the tosses pile up.

Throwing dice

The same idea with a die — the frequency of sixes should approach $1/6$ :

np.random.seed(51)

# 1,000 dice throws, each 1 to 6
dice = np.random.randint(1, 7, 1000)

# 1 when the throw is a six, 0 otherwise
six = [1 if x == 6 else 0 for x in dice]

# Running frequency of sixes
six_mean = [np.mean(six[:i + 1]) for i in range(len(six))]

Running frequency of sixes over 1,000 dice throws, settling onto the expected one sixth. — The observed frequency of sixes over 1,000 throws settles onto the expected 1/6 (dashed).

A common confusion

Sometimes you will hear that the law of large numbers means the counts of heads and tails grow closer together as the number of trials increases. That is not what it says — and we can show it. Here is the running difference between the number of heads and the number of tails:

np.random.seed(51)
coin = np.random.randint(0, 2, 1000)

# Cumulative (number of heads) minus (number of tails)
diff = [2 * np.sum(coin[:i + 1]) - (i + 1) for i in range(len(coin))]

The running difference between the number of heads and tails over 1,000 tosses, wandering up and down without settling. — The difference between the number of heads and tails wanders rather than converging. The *average* converges; the raw counts need not.

As far as we can tell, this series is not converging to any particular value — it just wanders. A related mistake is to imagine the coin has some kind of memory, so that a long run of tails makes heads more likely; that is the gambler’s fallacy.

Mathematical treatment

Convergence in probability

A sequence of random variables $X_1, X_2, \dots$ converges in probability to $b$ if, for every $\epsilon > 0$ ,¹

\lim_{n \to \infty} P(|X_n - b| < \epsilon) = 1

also written

X_n \overset{p}{\to} b

The law of large numbers states that the sample average converges in probability to the expected value:²

\bar{X}_n \overset{p}{\to} \mu

as $n \to \infty$ . This is the weak law of large numbers. Convergence in probability is just one of the ways to define the convergence of a sequence of random variables.³ The strong law of large numbers makes a stronger claim — the sample average converges almost surely to the expected value² — and its proof is beyond our scope here; Terence Tao has a nice exposition on his blog.⁴

Proof of the weak law

Assume every $X_i$ has variance $V(X_i) = \sigma^2$ . Since the samples are i.i.d.,

\begin{aligned} V(\bar{X}_n) &= V\!\left(\tfrac{1}{n}(X_1 + X_2 + \dots + X_n)\right) \\ &= \frac{1}{n^2} V(X_1 + X_2 + \dots + X_n) \\ &= \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n} \end{aligned}

and we know that

E(\bar{X}_n) = \mu

Applying Chebyshev’s inequality to $\bar{X}_n$ ,

P(|\bar{X}_n - \mu| \geq \epsilon) \leq \frac{\sigma^2}{n\epsilon^2}

Taking the complement,

\begin{aligned} P(|\bar{X}_n - \mu| < \epsilon) &= 1 - P(|\bar{X}_n - \mu| \geq \epsilon) \\ &= 1 - \frac{\sigma^2}{n\epsilon^2} \end{aligned}

and letting $n \to \infty$ drives the right-hand side to 1, which gives

\bar{X}_n \overset{p}{\to} \mu

DeGroot, M. H. & Schervish, M. J. Probability and Statistics. (Pearson Education, 2012). ↩
Loève, M. Probability Theory. (Springer, 1977). ↩ ↩²
Convergence of random variables. Wikipedia. https://en.wikipedia.org/wiki/Convergence_of_random_variables ↩
Tao, T. The strong law of large numbers. What’s new (2008). https://terrytao.wordpress.com/2008/06/18/the-strong-law-of-large-numbers/ ↩