next up previous
Next: Random bars, small data Up: Structure identification Previous: Structure identification

Random trees, large data set

For the first structure identification experiment, we generated a mixture of $m = 5$ trees over 30 variables with each vertex having $r = 4$ values. The distribution of the choice variable $\lambda$ as well as each tree's structure and parameters were sampled at random. The mixture was used to generate 30,000 data points that were used as a training set for a MIXTREE algorithm. The initial model had $m = 5$ components but otherwise was random. We compared the structure of the learned model with the generative model and computed the likelihoods of both the learned and the original model on a test dataset consisting of 1000 points. The algorithm was quite successful at identifying the original trees: out of 10 trials, the algorithm failed to identify correctly only 1 tree in 1 trial. Moreover, this result can be accounted for by sampling noise; the tree that wasn't identified had a mixture coefficient $\lambda$ of only $0.02$. The difference between the log likelihood of the samples of the generating model and the approximating model was 0.41 bits per example.
next up previous
Next: Random bars, small data Up: Structure identification Previous: Structure identification
Journal of Machine Learning Research 2000-10-19