next up previous
Next: The FACES dataset Up: Density estimation experiments Previous: Digits and digit pairs

The ALARM network

Our second set of density estimation experiments features the ALARM network as the data generating mechanism [Heckerman, Geiger, Chickering 1995,Cheng, Bell, Liu 1997]. This Bayesian network was constructed from expert knowledge as a medical diagnostic alarm message system for patient monitoring. The domain has $n=37$ discrete variables taking between 2 and 4 values, connected by 46 directed arcs. Note that this network is not a tree or a mixture of trees, but the topology of the graph is sparse, suggesting the possibility of approximating the dependency structure by a mixture of trees with a small number of components $m$. We generated a training set having $N_{train}=10,000$ data points and a separate test set of $N_{test}=2,000$ data points. On these sets we trained and compared the following methods: mixtures of trees (MT), mixtures of factorial (MF) distributions, the true model, and ``gzip.'' For MT and MF the model order $m$ and the degree of smoothing were selected by cross validation on randomly selected subsets of the training set.

Table 3: Density estimation results for the mixtures of trees and other models on the ALARM data set. Training set size $N_{train}=10,000$. Average and standard deviation over 20 trials.
Model Train likelihood Test likelihood
  [bits/data point] [bits/data point]
ALARM net 13.148 13.264
Mixture of trees $m=18$ 13.51 $\pm$0.04 14.55 $\pm$ 0.06
Mixture of factorials $m=28$ 17.11 $\pm$ 0.12 17.64 $\pm$ 0.09
Base rate 30.99 31.17
gzip 40.345 41.260

The results are presented in Table 3, where we see that the MT model outperforms the MF model as well as gzip and the base rate model.

Table 4: Density estimation results for the mixtures of trees and other models on a data set of size 1000 generated from the ALARM network. Average and standard deviation over 20 trials. Recall that $\alpha $ is a smoothing coefficient.
Model Train likelihood Test likelihood
  [bits/data point] [bits/data point]
ALARM net 13.167 13.264
Mixture of trees $m=2, \alpha=50$ 14.56 $\pm$0.16 15.51 $\pm$ 0.11
Mixture of factorials $m=12,  \alpha=100$ 18.20 $\pm$ 0.37 19.99 $\pm$ 0.49
Base rate 31.23 31.18
gzip 45.960 46.072

To examine the sensitivity of the algorithms to the size of the data set we ran the same experiment with a training set of size 1,000. The results are presented in Table 4. Again, the MT model is the closest to the true model. Notice that the degradation in performance for the mixture of trees is relatively mild (about 1 bit), whereas the model complexity is reduced significantly. This indicates the important role played by the tree structures in fitting the data and motivates the advantage of the mixture of trees over the mixture of factorials for this data set.
next up previous
Next: The FACES dataset Up: Density estimation experiments Previous: Digits and digit pairs
Journal of Machine Learning Research 2000-10-19