Next: Experiments Up: Decomposable priors and MAP Previous: Decomposable priors for tree

Decomposable priors for tree parameters

The decomposable prior for parameters that we introduce is a Dirichlet prior [Heckerman, Geiger, Chickering 1995]. The Dirichlet distribution is defined over the domain of and has the form

The numbers that parametrize can be interpreted as the sufficient statistics of a fictitious data set'' of size . Therefore are called fictitious counts. represents the strength of the prior. To specify a prior for tree parameters, one must specify a Dirichlet distribution for each of the probability tables , for each possible tree structure . This is achieved by means of a set of parameters satisfying

With these settings, the prior for the parameters in any tree that contains the directed edge is defined by . This representation of the prior is not only compact (order parameters) but it is also consistent: two different directed parametrizations of the same tree distribution receive the same prior. The assumptions allowing us to define this prior are explicated by MJa:uai00 and parallel the reasoning of heckerman:95 for general Bayes nets. Denote by the empirical distribution obtained from a data set of size and by the distribution defined by the fictitious counts. Then, by a property of the Dirichlet distribution [Heckerman, Geiger, Chickering 1995] it follows that learning a MAP tree is equivalent to learning an ML tree for the weighted combination of the two datasets''
 (10)

Consequently, the parameters of the optimal tree will be . For a mixture of trees, maximizing the posterior translates into replacing by and by in equation (10) above. This implies that the M step of the EM algorithm, as well as the E step, is exact and tractable in the case of MAP estimation with decomposable priors. Finally, note that the posteriors for models with different are defined up to a constant that depends on . Hence, one cannot compare posteriors of MTs with different numbers of mixture components . In the experiments that we present, we chose via other performance criteria: validation set likelihood in the density estimation experiments and validation set classification accuracy in the classification tasks.

Next: Experiments Up: Decomposable priors and MAP Previous: Decomposable priors for tree
Journal of Machine Learning Research 2000-10-19