NEU Probabilistic Programming Lab | The pi-lambda theorem and induction

In our paper to appear at PLDI 2023, many proofs of key facts make use of a technical result from measure theory called Dynkin’s $\pi$-$\lambda$ theorem. In particular, this theorem underlies our main result that probability spaces form a Kripke resource monoid with respect to independent combination (Theorem 2.4).

Dynkin’s $\pi$-$\lambda$ theorem. Let $P$ be a $\pi$-system and $L$ a $\lambda$-system. If $P\subseteq L$ then $\sigma(P)\subseteq L.$

Taking this statement apart,

Both $P$ and $L$ are collections of subsets of some ambient set $X$ that has been left implicit. This $X$ ends up being the sample space in most probabilistic use cases, and $P$ and $L$ end up being collections of events.
A $\pi$-system is a collection of subsets that is nonempty and closed under intersections.
A $\lambda$-system is a collection of subsets that contains the empty set and is closed under complements and countable disjoint unions.
Finally, $\sigma(P)$ is the smallest $\sigma$-algebra that contains $P$.

This theorem is fairly mystifying on first encounter: when would you want to prove something of the form $\sigma(P)\subseteq L$, and why is it helpful to introduce these auxiliary objects ($\pi$-systems and $\lambda$-systems) in order to do it? This blog post describes an intuition I find useful: ultimately, the point of $\pi$-$\lambda$ theorem is to get a probability-friendly way of doing induction on $\sigma$-algebras.

$\pi$-$\lambda$ as a good induction principle

In PL theory, many objects are defined inductively: for example, the abstract syntax of a language is defined as the smallest set of terms closed under a grammar, and type systems are defined as the smallest relation closed under inference rules. The $\sigma$-algebra $\sigma(G)$ generated by a collection $G$, being the smallest $\sigma$-algebra containing $G$, is also inductive in flavor. It satisfies the following induction principle: to show a given statement holds for all $E$ in $\sigma(G),$

Show the statement for all $E$ in $G$ (base case number 1).
Show the statement for the empty set (base case number 2).
Given the induction hypothesis that the statement holds for some $E$ in $\sigma(G)$, show that it holds for the complement $E^{\rm c}$ (inductive case number 1).
Given a countable family of sets ${E_n\in\sigma(G)}_{n\in\mathbb N}$ and the induction hypothesis that the statement holds for each $E_n$, show that it holds for the union $\bigcup_n E_n.$

(To validate this proof principle, let $S$ be the collection of sets satisfying the statement. Points (1)-(4) above are equivalent to saying that $S$ is a $\sigma$-algebra containing $G$. Since $\sigma(G)$ is the smallest such $\sigma$-algebra, it must be that $\sigma(G)\subseteq S$. This is equivalent to saying every $E$ in $\sigma(G)$ satisfies the statement, as desired.)

It turns out that this induction principle isn’t strong enough for most use cases involving probability. The main issue is that (4) requires proving something about the union of an arbitrary countable family of sets, while probability measures really only satisfy a nice property (countable additivity) with respect to families that are pairwise disjoint.

There is, however, an object whose induction principle only talks about families of disjoint sets: the $\lambda$-system! Write $\lambda(G)$ for the smallest $\lambda$-system containing $G$. There is a corresponding induction principle for $\lambda$-systems: to show a given statement for all $E$ in $\lambda(G)$,

Show the statement for all $E$ in $G$ (base case number 1).
Show the statement for the empty set (base case number 2).
Given the induction hypothesis that the statement holds for some $E$ in $\lambda(G)$, show that it holds for the complement $E^{\rm c}$ (inductive case number 1).
Given a countable family of disjoint sets ${F_n \in\lambda(G)}_{n\in\mathbb N}$ and the induction hypothesis that the statement holds for each $F_n$, show that it holds for the union $\bigcup_n F_n.$

The only difference between this induction principle and the one for $\sigma$-algebras, shown in bold above, is that (4) only requires showing the statement for disjoint families.

To sum up the situation: we have $\sigma$-algebras, which are objects we care about that have a poor induction principle when it comes to proving probabilistic facts, and $\lambda$-systems, which are auxiliary objects that have a nice induction principle. The $\pi$-$\lambda$ theorem is a way of using the nicer $\lambda$-system induction principle to reason about $\sigma$-algebras that we care about, so long as those $\sigma$-algebras are nice enough:

Restatement of the $\pi$-$\lambda$ theorem. If $G$ is a $\pi$-system then $\sigma(G) = \lambda(G)$. In particular, any statement about elements of $\sigma(G)$ can be proved by $\lambda$-system-induction rather than $\sigma$-algebra-induction.

Example: uniqueness of independent combinations

To illustrate this induction principle in action, we’ll use it to give an inductive proof of Lemma 2.3 from our paper, which establishes the uniqueness of independent combinations:

Lemma 2.3. Two probability spaces $(\mathcal E,\mu)$ and $(\mathcal F,\nu)$ form an independent combination $(\mathcal G,\rho)$ if $\mathcal G = \sigma(\mathcal E\cup\mathcal F)$ and $\rho(E\cap F) = \mu(E)\nu(F)$ for all $E\in\mathcal E$ and $F\in\mathcal F$. For any two spaces $(\mathcal E,\mu)$ and $(\mathcal F,\nu)$, there is at most one independent combination $(\mathcal G,\rho)$.

Proof. The key part of the proof is showing that if there is any $\rho’$ with $\rho’(E\cap F) = \mu(E)\nu(F)$ for all $E$ in $\mathcal E$ and $F$ in $\mathcal F$ then $\rho = \rho’$. By extensionality it suffices to show $\rho(E) = \rho’(E)$ for all $E$ in $\sigma(\mathcal E\cup\mathcal F)$. Let $\mathcal D$ be the collection of intersections $E\cap F$ for all $E$ in $\mathcal E$ and $F$ in $\mathcal F$. It’s a fact that $\mathcal D$ is a $\pi$-system and $\sigma(\mathcal D) = \sigma(\mathcal E\cup \mathcal F)$, so we are done if we can show $\rho(E) =\rho’(E)$ for all $E$ in $\sigma(\mathcal D)$. At this point we use the $\pi$-$\lambda$ theorem: since $\mathcal D$ is a $\pi$-system, we can prove this statement by $\lambda$-system-induction:

Base case 1: suppose $E\in \mathcal D$, so $E = E’\cap F$ for some $E’$ in $\mathcal E$ and $F$ in $\mathcal F$. By assumption we have that $\rho(E’\cap F)$ and $\rho’(E’\cap F)$ are both equal to $\mu(E’)\nu(F)$, so $\rho(E’\cap F) = \rho’(E’\cap F)$ as desired.
Base case 2: $\rho(\emptyset) = \rho’(\emptyset)$ because $\rho,\rho’$ are both probability measures.
Inductive case 1: suppose $\rho(E) = \rho’(E)$ for some $E$ in $\sigma(\mathcal D)$ (the induction hypothesis). Then $\rho(E^{\rm c}) = \rho’(E^{\rm c})$ directly from the induction hypothesis plus the fact that $\mu(A^{\rm c}) = 1 - \mu(A)$ for any measure $\mu$.
Inductive case 2: let $E$ be a countable family of disjoint sets and suppose $\rho(E_n) = \rho’(E_n)$ for all $n$. Then $\rho(\bigcup_n E_n) = \sum_n \rho(E_n) = \sum_n \rho'(E_n) = \rho(\bigcup_n E_n)$ as desired.☐

Compare this proof to the one given in Appendix B.3 of our paper, where we use a more traditional formulation of the $\pi$-$\lambda$ theorem. There, the main body of the inductive argument above is tucked away into the claim that the set of events on which $\rho$ and $\rho’$ agree forms a $\lambda$-system. On the one hand, this packaging up of inductive arguments makes proofs a lot shorter and avoids duplication of work. On the other hand, I find that the explicit inductive argument makes the proof much more tangible. If you like thinking of inductive proofs as recursive procedures, you can almost imagine the above proof as taking some arbitrary $E\in\sigma(\mathcal D)$ and recursively “pattern matching against it” to reveal the truth of $\rho’(E)=\rho(E)$ in each case, with recursive calls corresponding to appeals to induction hypotheses. The inductive formulation also gives a concrete sense for why we need the $\pi$-$\lambda$ theorem in the first place: note that inductive case 2 only goes through because $E$ is a family of disjoint sets, for otherwise we would not have been able to use the countable additivity property of probability measures.

Mysterious $\lambda$-systems as sufficiently general induction hypotheses

In normal induction proofs, there is a very common strategy of making the induction hypothesis as strong as possible by quantifying over the most number of variables. Logical Foundations gives a simple example: to prove

For all natural numbers $n$ and $m$, if $2n = 2m$, then $n = m$.

by induction on $n$, you need

For all natural numbers $m$, if $2n = 2m$, then $n = m$.

as the induction hypothesis rather than just

If $2n = 2m$ then $n = m$.

for fixed but arbitrary $m$.

This applies just as well to uses of the $\pi$-$\lambda$ theorem: sometimes mystifying choices of $\lambda$-system can be explained away as instances of this principle. In fact, we use this trick in our proof that independent combination is associative (the main content of Theorem 2.4): if you check out Theorem B.4 in Appendix B.3, you’ll see that its proof makes use of a set $\mathcal G$, defined to be the set of events $E_{23}$ such that for all $E_1$ in $\mathcal F_1$, it holds that $\mu_{(12)3}(E_1\cap E_{23}) = \mu_1(E_1)\mu_{23}(E_{23})$. Where does this set come from? It is constructed in order to prove the statement

For all $E_1$ in $\mathcal F_1$ and $E_{23}$ in $\mathcal F_{23}$, it holds that $\mu_{(12)3}(E_1\cap E_{23}) = \mu_1(E_1)\mu_{23}(E_{23})$.

The proof in Appendix B.3 is written in traditional style, but can be recast as a $\lambda$-system-induction on $E_{23}$. From this point of view and keeping the above principle of making the induction hypothesis as strong as possible in mind, it’s natural to take as induction hypothesis the statement

For all $E_1$, it holds that $\mu_{(12)3}(E_1\cap E_{23}) = \mu_1(E_1)\mu_{23}(E_{23})$.

Correspondingly, $\mathcal G$ is precisely the set of events $E_{23}$ for which this statement holds. I especially like how boring this justification is: what seems chosen arbitrarily when the proof is presented in the traditional way falls naturally out of the logical structure of the statement to be proved.

The $\pi$-$\lambda$ theorem, by induction

In fact, the induction-principle-based point of view is even useful for proving the $\pi$-$\lambda$ theorem itself! The crux of the proof is that, if $G$ is a $\pi$-system, then $\lambda(G)$ is closed under finite intersections; that is,

For all $E$ and $F$ in $\lambda(G)$, it holds that $E\cap F$ is in $\lambda(G)$.

Proofs I found when studying this topic conclude from here by conjuring up sets that happen to be $\lambda$-systems and then chasing around inequalities among these sets and $\lambda(G)$. Maybe this kind of proof is natural and intuitive if you’re a trained mathematician, but I found it to be inscrutable. Luckily, we can muscle through our own proof using the intuitions described thus far: the above statement can be proved by double induction on $E$ and $F$. If you’re interested in how the proof goes, check out the heavily-commented lemma λπ_is_π_system of the accompanying Coq mechanization, where the tactic induction_on implements induction on $\lambda$-systems.

After doing the proof this way, one could translate it into a proof in the traditional style by looking at the motives of the two inductions involved and taking these to be the sets to conjure up as $\lambda$-systems. But why do that when the inductive argument is already rigorous enough to make a theorem prover happy? The translation process feels a bit like hand-compiling code to assembly: where the structure of the inductive proof fades into the background, automatically handled by the induction principle, the translation process introduces explicit set-theoretic machinery to run the induction manually.

Conclusion

If you would like to play with the mathematical objects described in this post, check out the accompanying Coq mechanization, where we prove (a Coq approximation of) Theorem 2.4 from our paper.

In particular,

π_λ is the $\pi$-$\lambda$ theorem.
independent_combinations_unique is Lemma 2.3 of the paper establishing uniqueness of independent combinations as discussed above.
independent_combination_associative establishes associativity of independent combination; a comment in the proof script indicates the location corresponding to the construction of the set $\mathcal G$ described above.