The Feynman experience · Part 2

Markov chains

evergreen · Dec 14, 2018

Introduction

A Markov chain is an important example of a stochastic process. A sequence of random variables $X_1, X_2, \dots$ is called a stochastic process with discrete time parameter; the first, $X_1$ , is the initial state, and for $n = 2, 3, \dots$ the variable $X_n$ is the state of the process at time $n$ .¹ These sequences differ from the ones met elsewhere in statistics — in the proof of the law of large numbers, say — because the random variables are not necessarily independent and identically distributed. In fact, the interesting question in a stochastic process is precisely how the variables depend on one another over time. Generally, we want the distribution of the next state $X_{n+1}$ given everything that has already happened:

Pr(X_{n+1} = j \mid X_n = i, X_{n-1} = i_{n-1}, X_{n-2} = i_{n-2}, \dots)

The Markov property

A Markov chain models a memoryless stochastic process: the probability of the next state depends only on the present one. This is the Markov property, and it simplifies the equation above to

Pr(X_{n+1} = j \mid X_n = i)

Past and future are conditionally independent given the present.² For a finite Markov chain with $k$ possible states, the conditional distributions $Pr(X_{n+1} = j \mid X_n = i)$ for $i, j = 1, \dots, k$ are called the transition distributions of the chain.¹ If they do not change over time, the chain is homogeneous, or said to have stationary transition distributions.

The transition matrix

Consider a finite Markov chain with stationary transition distributions $p_{ij} = Pr(X_{n+1} = j \mid X_n = i)$ . The transition matrix is the $k \times k$ matrix $\mathbf{P}$ with elements $p_{ij}$ :¹

\mathbf{P} = \begin{pmatrix} p_{11} & p_{12} & \cdots & p_{1k} \\ p_{21} & p_{22} & \cdots & p_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ p_{k1} & p_{k2} & \cdots & p_{kk} \end{pmatrix}

Let’s use one to define a Markov chain with five states. As throughout this series, we work from scratch in plain Python — here, with nothing but NumPy:

import numpy as np

# Each row i holds the probabilities of moving from state i to every state
transition_matrix = np.array([[0.4, 0.6, 0.0, 0.0, 0.0],
                              [0.0, 0.0, 0.9, 0.0, 0.1],
                              [0.4, 0.0, 0.1, 0.5, 0.0],
                              [0.2, 0.7, 0.0, 0.0, 0.1],
                              [0.5, 0.0, 0.0, 0.5, 0.0]])

We can picture the same chain as a directed graph — one node per state, an arrow for every possible transition, each labelled with its probability:

The five-state chain defined by the transition matrix. Each state has its own colour; the arrows leaving it — and their probability labels — share that colour, and those outgoing probabilities sum to one.

Because each row holds the probabilities of leaving a given state, every row must sum to one and all entries must be nonnegative. A square matrix with these properties is a stochastic matrix.

# Each row sums to one
np.sum(transition_matrix, axis=1)
# → array([1., 1., 1., 1., 1.])

The initial distribution

We can describe where the chain starts with a row vector $\mathbf{v}$ of length $k$ , where each element $v_i$ is the probability of being in state $i$ . This is the initial distribution of the chain:

\mathbf{v} = [\,v_1, v_2, \dots, v_k\,]

Given the initial distribution and the transition matrix, the distribution after one step is

\mathbf{v}_{n+1} = \mathbf{v}_n \mathbf{P}

which generalises to $s$ steps as

\mathbf{v}_{n+s} = \mathbf{v}_n \mathbf{P}^{s}

Let’s start from state 1 — that is, $\mathbf{v} = [1, 0, 0, 0, 0]$ — and compute the distribution after one step:

state = np.array([1, 0, 0, 0, 0])

state @ transition_matrix
# → array([0.4, 0.6, 0. , 0. , 0. ])

This matches the first row of the transition matrix: from state 1 there is a probability of 0.6 of moving to state 2 and 0.4 of staying put. Now suppose we want the distribution after ten steps:

state @ np.linalg.matrix_power(transition_matrix, 10)
# → array([0.27745538, 0.28189331, 0.23522178, 0.16504972, 0.04037981])

These are the probabilities of being in each state after ten steps, starting from $\mathbf{v} = [1, 0, 0, 0, 0]$ . What if we start somewhere else?

state_2 = np.array([0.7, 0, 0.1, 0, 0.2])

state_2 @ np.linalg.matrix_power(transition_matrix, 10)
# → array([0.27496154, 0.26573116, 0.24972629, 0.16938226, 0.04019874])

The results are remarkably similar — which brings us to the last concept, the stationary distribution.

The stationary distribution

Under some general conditions I won’t go into here, a probability vector $\mathbf{v}$ that satisfies $\mathbf{v}\mathbf{P} = \mathbf{v}$ is called a stationary distribution of the Markov chain.¹ That equation should ring a bell from linear algebra: a vector with that property is an eigenvector of the matrix. In other words, the stationary distribution has a closed form — it is the left eigenvector of $\mathbf{P}$ with eigenvalue 1:

# Take the left eigenvector
v = np.linalg.eig(transition_matrix.T)[1][:, 0]

# Normalise (l1) so the entries sum to one
v.real / np.linalg.norm(v.real, ord=1)
# → array([0.26588235, 0.26823529, 0.26823529, 0.15529412, 0.04235294])

Starting from any initial distribution, the chain converges to this vector as the number of steps $s \to \infty$ . That is why our two ten-step experiments landed in nearly the same place.

Conclusion

We now have some basic tools for exploring Markov chains. A natural next step is Markov chain Monte Carlo (MCMC), which is built on top of them and shows up everywhere in Bayesian inference. The details can be daunting; for a fun application of Markov chains to try your skills on, here’s a nice one: simulating Chutes and Ladders.³

DeGroot, M. H. & Schervish, M. J. Probability and Statistics. (Pearson Education, 2012). ↩ ↩² ↩³ ↩⁴
Blitzstein, J. Lectures 31–32: Markov Chains, Statistics 110. (Harvard University, 2013). https://www.youtube.com/watch?v=8AJPs3gvNlY ↩
VanderPlas, J. Simulating Chutes & Ladders in Python. Pythonic Perambulations (2017). https://jakevdp.github.io/blog/2017/12/18/simulating-chutes-and-ladders/ ↩