Next: The FACES dataset
Up: Density estimation experiments
Previous: Digits and digit pairs
Our second set of density estimation experiments features the ALARM
network as the data generating mechanism [Heckerman, Geiger, Chickering
1995,Cheng, Bell, Liu
1997].
This Bayesian network was constructed from expert knowledge as a medical
diagnostic alarm message system for patient monitoring. The domain
has discrete variables taking between 2 and 4 values, connected
by 46 directed arcs. Note that this network is not a tree or
a mixture of trees, but the topology
of the graph is sparse, suggesting the possibility of approximating
the dependency structure by a mixture of trees with a small number
of components .
We generated a training set having
data points and
a separate test set of
data points. On these sets we
trained and compared the following methods: mixtures of trees (MT),
mixtures of factorial (MF) distributions, the true model, and
``gzip.'' For MT and MF the model order and the degree of
smoothing were selected by cross validation on randomly selected
subsets of the training set.
Table 3:
Density estimation results for the mixtures of trees and other models
on the ALARM data set. Training set size
. Average and
standard deviation over 20 trials.
Model 
Train likelihood 
Test likelihood 

[bits/data point] 
[bits/data point] 
ALARM net 
13.148 
13.264 
Mixture of trees 
13.51 0.04 
14.55 0.06 
Mixture of factorials 
17.11 0.12 
17.64 0.09 
Base rate 
30.99 
31.17 
gzip 
40.345 
41.260 
The results are presented in Table 3, where
we see that the MT model outperforms the MF model as well as
gzip and the base rate model.
Table 4:
Density estimation results for the mixtures of trees and other models
on a data set of size 1000 generated from the ALARM network.
Average and standard deviation over 20 trials. Recall that
is a smoothing coefficient.
Model 
Train likelihood 
Test likelihood 

[bits/data point] 
[bits/data point] 
ALARM net 
13.167 
13.264 
Mixture of trees

14.56 0.16 
15.51 0.11 
Mixture of factorials

18.20 0.37 
19.99 0.49 
Base rate 
31.23 
31.18 
gzip 
45.960 
46.072 
To examine the sensitivity of the algorithms to the size of the data
set we ran the same experiment with a training set of size
1,000. The results are presented in Table 4.
Again, the MT model is the closest to the true model.
Notice that the degradation in performance for the
mixture of trees is relatively mild (about 1 bit), whereas the model
complexity is reduced significantly. This indicates the important role
played by the tree structures in fitting the data and motivates the
advantage of the mixture of trees over the mixture of factorials for
this data set.
Next: The FACES dataset
Up: Density estimation experiments
Previous: Digits and digit pairs
Journal of Machine Learning Research
20001019