next up previous
Next: Presorting mutual information values Up: Discrete variables of arbitrary Previous: Discrete variables of arbitrary

Computing co-occurrences

As before, we avoid representing zero values explicitly by replacing each data point $x$ by the list $xlist$, where $xlist\;=\;{\rm list}\{(v,x_v), v\in V, x_v\neq 0\}$. A co-occurrence is represented by the quadruple $(u,x_u,v,x_v)$, $x_u,x_v\neq 0$. Instead of one co-occurrence count $N_{uv}$, we now have a two-way contingency table $N_{uv}^{ij}$. Each $N_{uv}^{ij}$ represents the number of data points where $u=i,v=j, i,j\neq0$. Counting and storing co-occurrences can be done in the same time as before and with a ${\cal O}(r_{MAX}^2)$ larger amount of memory, necessitated by the additional need to store the (non-zero) variable values.

Journal of Machine Learning Research 2000-10-19