13 Probabilistic Circuits

Probabilistic Circuits (PCs) are a unifying framework for tractable probabilistic models. A PC encodes a probability distribution as a directed acyclic computational graph (DAG) built from three types of nodes:

Leaf (input) nodes: simple distributions over variables, e.g. univariate densities or indicator functions \mathbb{1}[X = x].
Sum nodes: compute a weighted mixture of their children: S = \sum_{i} w_i \, C_i, \qquad w_i \geq 0, \quad \sum_i w_i = 1
Product nodes: compute the product of their children, encoding (conditional) independence: P = \prod_{j} C_j

The circuit computes a distribution p(\mathbf{X}) via a single bottom-up pass: leaves evaluate base distributions, products multiply, sums mix.

The key advantage of PCs is that exact inference (marginals, conditionals) can be computed in O(|C|) time (linear in circuit size) provided the circuit satisfies certain structural properties:

Property	Definition	What it enables
Smoothness	Every child of a sum node is defined over the same set of variables: \text{scope}(C_i) = \text{scope}(C_j) for all children C_i, C_j of any sum node	Correct marginalization
Decomposability	Children of every product node have disjoint scopes: \text{scope}(C_i) \cap \text{scope}(C_j) = \emptyset, i \neq j	Efficient multiplication
Determinism	For any input, at most one child of each sum node evaluates to a non-zero value	Efficient MAP, sampling

With smoothness and decomposability, computing any marginal p(\mathbf{E} = \mathbf{e}) reduces to a single bottom-up evaluation: at leaf nodes, set observed variables to their values and marginalized variables to 1.

13.1 LearnSPN

LearnSPN learns the structure and parameters of an SPN directly from data, top-down and recursively. The intuition is best seen by thinking of the data as a matrix (rows = instances, columns = variables):

        X1  X2  X3  X4
inst 1 [ .   .   .   . ]
inst 2 [ .   .   .   . ]
inst 3 [ .   .   .   . ]
inst 4 [ .   .   .   . ]

At each step, the algorithm tries one of two splits on the current sub-matrix:

Column split → Product node. Test if groups of variables (columns) are approximately independent. If yes, split the matrix vertically into independent column groups and create a product node whose children recurse on each group. This encodes a factorization.

  [X1 X2 | X3 X4]  →   ×
                      / \
              (X1,X2)   (X3,X4)

Row split → Sum node. If the variables are not separable, cluster the instances (rows) into groups and create a sum node whose children recurse on each cluster, weighted by cluster size. This encodes a mixture.

  [inst 1,2,3]     →  +
  [inst 4  ]         / \
                  w=0.6   w=0.4

Base case. When only one variable remains (single column), fit a univariate leaf distribution.

The resulting tree of sum/product nodes is smooth and decomposable by construction, so inference is guaranteed to be tractable.