Probabilistic Decision Trees for Local Distributions

Next: Probabilistic Inference With Real Up: General Dependency Networks Previous: Definition and Basic Properties

Probabilistic Decision Trees for Local Distributions

When learning a dependency network from data, a variety of classification/regression techniques may be used to estimate the local distributions. We have used methods based on probabilistic decision trees (e.g., Buntine, 1991) and probabilistic support vector machines (e.g., Platt, 1999). For simplicity, we limit our discussion in this paper to the use of probabilistic decision trees. In this approach, for each variable

in ${\bf X}$ , we learn a probabilistic decision tree where

is the target variable and ${\bf X}\setminus X_i$ are the input variables. Each leaf is modeled as a multinomial distribution. To learn the decision-tree structure, we use a simple hill-climbing approach in conjunction with a Bayesian score (posterior probability of model structure) as described by Friedman and Goldszmdit (1996) and Chickering, Heckerman, and Meek (1997). To learn a decision-tree structure for

, we initialize the search algorithm with a singleton root node having no children. Then, we replace each leaf node in the tree with a binary split on some variable

in ${\bf X}\setminus X_i$ , until no such replacement increases the score of the tree. Our binary split on

is a decision-tree node with two children: one of the children corresponds to a particular value of

, and the other child corresponds to all other values of

. Our Bayesian scoring function uses a uniform prior distribution for the parameters of all multinomial distributions, and a structure prior proportional to $\kappa^{f}$ , where $\kappa > 0$ is a tunable parameter and

is the number of free parameters in the decision tree. In studies that predated those described in this paper, we have found that the setting $\kappa = 0.1$ yields accurate predictions over a wide variety of datasets. We use this same setting in the experiments described in this paper.

Next: Probabilistic Inference With Real Up: General Dependency Networks Previous: Definition and Basic Properties

Journal of Machine Learning Research, 2000-10-19