next up previous
Next: Decomposable priors and MAP Up: Mixtures of trees Previous: Running time


Learning mixtures of trees with shared structure

It is possible to modify the MIXTREE algorithm so as to constrain the $m$ trees to share the same structure, and thereby estimate MTSS models. The E step remains unchanged. The only novelty is the reestimation of the tree distributions $T^k$ in the M step, since they are now constrained to have the same structure. Thus, the maximization cannot be decoupled into $m$ separate tree estimations but, remarkably enough, it can still be performed efficiently. It can be readily verified that for any given structure the optimal parameters of each tree edge $T^k_{uv}$ are equal to the parameters of the corresponding marginal distribution $P^k_{uv}$. It remains only to find the optimal structure. The expression to be optimized is the second sum in the right-hand side of equation (7). By replacing $T^k_{uv}$ with $P^k_{uv}$ and denoting the mutual information between $u$ and $v$ under $P^k$ by $I^k_{uv}$ this sum can be reexpressed as follows:
$\displaystyle \sum_{k=1}^m \Gamma_k \sum_{i=1}^N P^k(x^i)\log T^k(x^i)$ $\textstyle =$ $\displaystyle N\sum_{k=1}^m \lambda_k [\sum_{(u,v)\in E}I^k_{uv} -
\sum_{v \in V} H(P^k_v)]$  
  $\textstyle =$ $\displaystyle N\sum_{(u,v)\in E}I_{uv\vert z} - N\sum_{{v \in V}}H(v\vert z).$ (8)

The new quantity $I_{uv\vert z}$ appearing above represents the mutual information of $u$ and $v$ conditioned on the hidden variable $z$. Its general definition for three discrete variables $u,v,z$ distributed according to $P_{uvz}\equiv P$ is

\begin{displaymath}
I_{uv\vert z} \;=\; \sum_{x_z} P_z(x_z) \sum_{x_ux_v} P_{uv...
..._z)}{P_{u\vert z}( x_u\vert x_z)P_{v\vert z}( x_v\vert x_z)}.
\end{displaymath}

The second term in (8), $N\sum_{{v \in V}}H(v\vert z)$, represents the sum of the conditional entropies of the variables given $z$ and is independent of the tree structure. Hence, the optimization of the structure $E$ can be achieved by running a MWST algorithm with the edge weights represented by $I_{uv\vert z}$. We summarize the algorithm in Figure 7.

Figure 7: The MIXTREESS algorithm for learning MTSS models.
\begin{figure}
\begin{tabular}{l}
\hline
\\
Algorithm {\sc MixTreeSS}$({\ca...
...k,\;\lambda^k,\;k=1,\ldots m\}$ \\
\\
\hline
\end{tabular}
\end{figure}


next up previous
Next: Decomposable priors and MAP Up: Mixtures of trees Previous: Running time
Journal of Machine Learning Research 2000-10-19