plus an additive constant. The EM algorithm can be adapted to maximize the log posterior for every fixed [Neal, Hinton 1999]. Indeed, by comparing with equation (5) we see that the quantity to be maximized is now:

The prior term does not influence the E step of the EM algorithm, which proceeds exactly as before (cf. equation (6)). To be able to successfully maximize the right-hand side of (9) in the M step we require that log decomposes into a sum of independent terms matching the decomposition of in (7). A prior over mixtures of trees that is amenable to this decomposition is called a

The first parenthesized factor in this equation represents the prior of the tree structure while the second factor is the prior for parameters. Requiring that the prior be decomposable is equivalent to making several independence assumptions: in particular, it means that the parameters of each tree in the mixture are independent of the parameters of all the other trees as well as of the probability of the mixture variable. In the following section, we show that these assumptions are not overly restrictive, by constructing decomposable priors for tree structures and parameters and showing that this class is rich enough to contain members that are of practical importance.