next up previous
Next: Memory requirements Up: The ACCL algorithm Previous: The algorithm and its

Running time

The algorithm requires: ${\cal O}(sn)$ for computing $N_v, v\in V$, ${\cal O}(n\log n)$ for sorting the variables, ${\cal O}(s^2N\log(s^2N/n))$ for step 2, ${\cal O}(n)$ for step 3, ${\cal O}(n_K\log n +N_C)$ for the Kruskal algorithm ($n_K$ is the number of edges examined by the algorithm, $\log n$ is the time for an extraction from $vheap$, $N_C$ is an upper bound for the number of elements in the lists $\overline{V}_0$ whose elements need to be skipped occasionally when we extract variables from the virtual lists $V_0$), and ${\cal O}(n)$ for creating the the $n-1$ probability tables in step 5. Summing these terms, we obtain the following upper bound for the running time of the acCL algorithm:

\begin{displaymath}
{\cal O}( n\log n + sn + s^2N \log\frac{s^2N}{n} + n_K\log n).
\end{displaymath}

If we ignore the logarithmic factors this simplifies to

\begin{displaymath}
\tilde{{\cal O}}( sn + s^2N + n_K).
\end{displaymath}

For constant $s$, this bound is a polynomial of degree 1 in the three variables $n,\;N$ and $n_K$. Because $n_K$ has the range $n-1\;\leq\; n_K \;\leq\; {n(n-1)}/{2}$, the worst case complexity of the ACCL algorithm is quadratic in $n$. Empirically, however, as we show below, we find that the dependence of $n_K$ on $n$ is generally subquadratic. Moreover, random graph theory implies that if the distribution of the weight values is the same for all edges, then Kruskal's algorithm should take time proportional to $n\log n$ [West 1996]. To verify this latter result, we conducted a set of Monte Carlo experiments, in which we ran the Kruskal algorithm on sets of random weights over domains of dimension up to $n=3000$. For each $n$, 1000 runs were performed. Figure 22 plots the average and maximum $n_K$ versus $n\log n$ for these experiments. The curve for the average displays an essentially linear dependence.

Figure 22: The mean (full line), standard deviation and maximum (dotted line) of the number of steps $n_K$ in the Kruskal algorithm over 1000 runs plotted against $n\log n$. $n$ ranges from 5 to 3000. The edge weights were sampled from a uniform distribution.
\begin{figure}
\centerline{\epsfig{file=figures/figkruskal.ps,width=2.4in}}
\end{figure}


next up previous
Next: Memory requirements Up: The ACCL algorithm Previous: The algorithm and its
Journal of Machine Learning Research 2000-10-19