Suppose we have independent and identically distributed random variables
Yi∼Normal(μ,σ2), with observed values yi. Given a
sample of size n, {y1,y2,…,yn}, we want to infer the mean μ.
There are many ways to approach this; here we implement three estimators:
the sample mean estimator,
the least squares estimator,
the maximum likelihood estimator.
Let’s generate a sample to work with. As throughout this series, we work from
scratch in plain Python with NumPy and SciPy:
import numpy as npnp.random.seed(51)real_mu, real_sigma, n = 5, 1, 1000# One sample of size 1,000 from Normal(5, 1)sample_1 = real_sigma * np.random.randn(n) + real_mu
One sample of 1,000 draws from Normal(5, 1), with the population density overlaid.
Estimators
What is an estimator? Following Casella and Berger, a point estimator is any
function W(X1,…,Xn) of a sample.1 Let’s evaluate a few such
functions.
Sample mean
The sample mean is yˉ=n1∑i=1nyi. Its expected value
is
So we expect the sample mean to land near the real μ=5:
np.round(np.mean(sample_1), 5)# → 4.99545
Distribution
From the derivations above, the sample mean should follow
Yˉ∼Normal(μ,nσ2) — here
Normal(5,0.001). We can check this empirically by drawing a hundred
samples and taking the mean of each:
The hundred sample means cluster around 5, tracking the expected Normal(5, 0.001) sampling distribution.
Least squares
Let’s view the same distribution differently — as the simplest possible
regression,2Yi=β0+ϵ with ϵ∼Normal(0,σ2), where β0 is a constant to identify. Here
β0 and μ are the same thing, and the inference becomes an optimization
problem:
β^0=β0argmini=1∑n(yi−β0)2
Differentiating with respect to β0 and setting the result to zero,
All three estimators agree to five decimals — which is the point: different
principles, same estimator.
Inference
Reference range
If the population follows Normal(μ,σ2), the sample mean follows
Normal(μ,nσ2). Knowing μ and σ,
we could state a reference range for the plausible values of the sample mean:
RR(Yˉ,α):μ±Zα/2nσ
Confidence interval
In practice we don’t know μ. Instead we build an interval around the sample
mean Yˉ that probably contains it — a confidence interval:
CI(μ,α):Yˉ±Zα/2nσ
Using our hundred samples and Zα/2=2 (about two standard deviations,
so roughly a 95% interval),
we expect close to 95 of them to contain the real μ=5:
# A ~95% interval (Z ≈ 2) around each of the 100 sample meansci_100 = [(m - 2 / np.sqrt(n), m + 2 / np.sqrt(n)) for m in mean_100]# How many contain the true mu = 5?contains_mu = [lo < 5 < hi for lo, hi in ci_100]np.sum(contains_mu)# → 97
97 out of 100 — comfortably in line with the expected 95%.
References
Casella, G. & Berger, R. L. Statistical Inference (2nd ed.). (Duxbury, 2002). ↩
Costa, M. A. Tópicos em Ciência dos Dados: Introdução aos Modelos Paramétricos. (Bonecker, 2019). ↩