13 Probabilistic Circuits
Probabilistic Circuits (PCs) are a unifying framework for tractable probabilistic models. A PC encodes a probability distribution as a directed acyclic computational graph (DAG) built from three types of nodes:
- Leaf (input) nodes: simple distributions over variables, e.g. univariate densities or indicator functions \mathbb{1}[X = x].
- Sum nodes: compute a weighted mixture of their children: S = \sum_{i} w_i \, C_i, \qquad w_i \geq 0, \quad \sum_i w_i = 1
- Product nodes: compute the product of their children, encoding (conditional) independence: P = \prod_{j} C_j
The circuit computes a distribution p(\mathbf{X}) via a single bottom-up pass: leaves evaluate base distributions, products multiply, sums mix.
The key advantage of PCs is that exact inference (marginals, conditionals) can be computed in O(|C|) time (linear in circuit size) provided the circuit satisfies certain structural properties:
| Property | Definition | What it enables |
|---|---|---|
| Smoothness | Every child of a sum node is defined over the same set of variables: \text{scope}(C_i) = \text{scope}(C_j) for all children C_i, C_j of any sum node | Correct marginalization |
| Decomposability | Children of every product node have disjoint scopes: \text{scope}(C_i) \cap \text{scope}(C_j) = \emptyset, i \neq j | Efficient multiplication |
| Determinism | For any input, at most one child of each sum node evaluates to a non-zero value | Efficient MAP, sampling |
With smoothness and decomposability, computing any marginal p(\mathbf{E} = \mathbf{e}) reduces to a single bottom-up evaluation: at leaf nodes, set observed variables to their values and marginalized variables to 1.
13.1 LearnSPN
LearnSPN learns the structure and parameters of an SPN directly from data, top-down and recursively. The intuition is best seen by thinking of the data as a matrix (rows = instances, columns = variables):
X1 X2 X3 X4
inst 1 [ . . . . ]
inst 2 [ . . . . ]
inst 3 [ . . . . ]
inst 4 [ . . . . ]
At each step, the algorithm tries one of two splits on the current sub-matrix:
Column split → Product node. Test if groups of variables (columns) are approximately independent. If yes, split the matrix vertically into independent column groups and create a product node whose children recurse on each group. This encodes a factorization.
[X1 X2 | X3 X4] → ×
/ \
(X1,X2) (X3,X4)
Row split → Sum node. If the variables are not separable, cluster the instances (rows) into groups and create a sum node whose children recurse on each cluster, weighted by cluster size. This encodes a mixture.
[inst 1,2,3] → +
[inst 4 ] / \
w=0.6 w=0.4
Base case. When only one variable remains (single column), fit a univariate leaf distribution.
The resulting tree of sum/product nodes is smooth and decomposable by construction, so inference is guaranteed to be tractable.