next up previous
Next: Tree distributions Up: Introduction Previous: Introduction


Related work

The MT model can be used both in the classification setting and the density estimation setting, and it makes contact with different strands of previous literature in these two guises. In the classification setting, the MT model builds on the seminal work on tree-based classifiers by [Chow, Liu 1968], and on recent extensions due to friedman:97 and friedman:98a. Chow and Liu proposed to solve $M$-way classification problems by fitting a separate tree to the observed variables in each of the $M$ classes, and classifying a new data point by choosing the class having maximum class-conditional probability under the corresponding tree model. Friedman et al. took as their point of departure the Naive Bayes model, which can be viewed as a graphical model in which an explicit class node has directed edges to an otherwise disconnected set of nodes representing the input variables (i.e., attributes). Introducing additional edges between the input variables yields the Tree Augmented Naive Bayes (TANB) classifier [Friedman, Geiger, Goldszmidt 1997,Geiger 1992]. These authors also considered a less constrained model in which different patterns of edges were allowed for each value of the class node--this is formally identical to the Chow and Liu proposal. If the choice variable of the MT model is identified with the class label then the MT model is identical to the Chow and Liu approach (in the classification setting). However, we do not necessarily wish to identify the choice variable with the class label, and, indeed, in our experiments on classification we treat the class label as simply another input variable. This yields a more discriminative approach to classification in which all of the training data are pooled for the purposes of training the model (Section 5, MJ:nips98). The choice variable remains hidden, yielding a mixture model for each class. This is similar in spirit to the ``mixture discriminant analysis'' model of hastie:96, where a mixture of Gaussians is used for each class in a multiway classification problem. In the setting of density estimation, clustering and compression problems, the MT model makes contact with the large and active literature on mixture modeling. Let us briefly review some of the most salient connections. The Auto-Class model [Cheeseman, Stutz 1995] is a mixture of factorial distributions (MF), and its excellent cost/performance ratio motivates the MT model in much the same way as the Naive Bayes model motivates the TANB model in the classification setting. (A factorial distribution is a product of factors each of which depends on exactly one variable). tirri:97 study a MF in which a hidden variable is used for classification; this approach was extended by monti:98. The idea of learning tractable but simple belief networks and superimposing a mixture to account for the remaining dependencies was developed independently of our work by thiesson:97, who studied mixtures of Gaussian belief networks. Their work interleaves EM parameter search with Bayesian model search in a heuristic but general algorithm.
next up previous
Next: Tree distributions Up: Introduction Previous: Introduction
Journal of Machine Learning Research 2000-10-19