Computing co-occurrences in a list representation

Next: The algorithm and its Up: The ACCL algorithm Previous: Comparing mutual information between

Computing co-occurrences in a list representation

Let ${\cal D} = \{x^1, \ldots x^N\}$ be a set of observations over

binary variables. If $s \ll n$ it is efficient to represent each observation in ${\cal D}$ as a list of the variables that are on in the respective observation. Thus, data point $x^i, i=1, \ldots N$ will be represented by the list $xlist^i\;=\;{\rm list}\{v\in V\vert x^i_v=1\}$ . The space required by the lists is no more than

and is much smaller than the space required by the binary vector representation of the same data (i.e.,

). Note, moreover, that the total number

of co-occurrences in the dataset ${\cal D}$ is

$\begin{displaymath} N_C\;=\;\sum_{v\succ u} N_{uv} \;\leq \; \frac{1}{2}s^2N. \end{displaymath}$

For the variables that co-occur with

, a set of co-occurrence lists

is created. Each

contains records $(v,I_{uv}),\;v\prec u, N_{uv}>0$ and is sorted by decreasing $I_{uv}$ . To represent the lists

(that contain many elements) we use their ``complements'' $\overline{V_0}(u)\;=\;\{v\in V,\;v\prec u,\;N_{uv}>0\}$ . It can be shown [Meila-Predoviciu 1999] that the computation of the co-occurrence counts and the construction of the lists

and $\overline{V}_0(u)$ , for all $u\in V$ , takes an amount of time proportional to the number of co-occurrences

, up to a logarithmic factor:

$\begin{displaymath} {\cal O}(s^2N\log (s^2N/n)). \end{displaymath}$

Comparing this value with ${\cal O}(n^2N)$ , which is the time to compute $P_{uv}$ in the CHOWLIU algorithm, we see that the present method replaces the dimension of the domain

. The memory requirements for the lists are also at most proportional to

, hence ${\cal O}(s^2N)$ .

Next: The algorithm and its Up: The ACCL algorithm Previous: Comparing mutual information between

Journal of Machine Learning Research 2000-10-19