The Feynman experience · Part 1
The law of large numbers
Introduction
Everyone who studies probability runs into at least two limit theorems: the law of large numbers and the central limit theorem. The law of large numbers describes the result of performing the same experiment a large number of times. It is pervasive in probability and shows up everywhere from gambling to risk management.
Examples
Let’s simulate the two usual suspects of any probability example: a coin and a die.
Coin toss
Let’s track the frequency of heads over 1,000 coin tosses. As throughout this series, we work from scratch in plain Python with NumPy:
import numpy as np
np.random.seed(51)
# 1,000 coin tosses, each 0 (tails) or 1 (heads)
coin = np.random.randint(0, 2, 1000)
# Running frequency of heads after each toss
heads_mean = [np.mean(coin[:i + 1]) for i in range(len(coin))]
The frequency lurches around early on, then closes in on the expected 0.5 as the tosses pile up.
Throwing dice
The same idea with a die — the frequency of sixes should approach :
np.random.seed(51)
# 1,000 dice throws, each 1 to 6
dice = np.random.randint(1, 7, 1000)
# 1 when the throw is a six, 0 otherwise
six = [1 if x == 6 else 0 for x in dice]
# Running frequency of sixes
six_mean = [np.mean(six[:i + 1]) for i in range(len(six))]
A common confusion
Sometimes you will hear that the law of large numbers means the counts of heads and tails grow closer together as the number of trials increases. That is not what it says — and we can show it. Here is the running difference between the number of heads and the number of tails:
np.random.seed(51)
coin = np.random.randint(0, 2, 1000)
# Cumulative (number of heads) minus (number of tails)
diff = [2 * np.sum(coin[:i + 1]) - (i + 1) for i in range(len(coin))]
As far as we can tell, this series is not converging to any particular value — it just wanders. A related mistake is to imagine the coin has some kind of memory, so that a long run of tails makes heads more likely; that is the gambler’s fallacy.
Mathematical treatment
Convergence in probability
A sequence of random variables converges in probability to if, for every ,1
also written
The law of large numbers states that the sample average converges in probability to the expected value:2
as . This is the weak law of large numbers. Convergence in probability is just one of the ways to define the convergence of a sequence of random variables.3 The strong law of large numbers makes a stronger claim — the sample average converges almost surely to the expected value2 — and its proof is beyond our scope here; Terence Tao has a nice exposition on his blog.4
Proof of the weak law
Assume every has variance . Since the samples are i.i.d.,
and we know that
Applying Chebyshev’s inequality to ,
Taking the complement,
and letting drives the right-hand side to 1, which gives
References
-
DeGroot, M. H. & Schervish, M. J. Probability and Statistics. (Pearson Education, 2012). ↩
-
Convergence of random variables. Wikipedia. https://en.wikipedia.org/wiki/Convergence_of_random_variables ↩
-
Tao, T. The strong law of large numbers. What’s new (2008). https://terrytao.wordpress.com/2008/06/18/the-strong-law-of-large-numbers/ ↩