next up previous
Next: Computing co-occurrences Up: The accelerated tree learning Previous: Memory requirements


Discrete variables of arbitrary arity

We briefly describe the extension of the ACCL algorithm to the case of discrete domains in which the variables can take more than two values. First we extend the definition of data sparseness: we assume that for each variable there exists a special value that appears with higher frequency than all the other values. This value will be denoted by 0, without loss of generality. For example, in a medical domain, the value 0 for a variable would represent the ``normal'' value, whereas the abnormal values of each variable would be designated by non-zero values. An ``occurrence'' for variable $v$ will be the event $v\neq 0$ and a ``co-occurrence'' of $u$ and $v$ means that $u$ and $v$ are both non-zero for the same data point. We define $\vert x\vert$ as the number of non-zero values in observation $x$. The sparseness $s$ is, as before, the maximum of $\vert x\vert$ over the data set. To exploit the high frequency of the zero values we represent only the occurrences explicitly, creating thereby a compact and efficient data structure. We obtain performance gains by presorting mutual information values for non-co-occurring variables.

Subsections
next up previous
Next: Computing co-occurrences Up: The accelerated tree learning Previous: Memory requirements
Journal of Machine Learning Research 2000-10-19