Learning mixtures of trees with shared structure

Next: Decomposable priors and MAP Up: Mixtures of trees Previous: Running time

Learning mixtures of trees with shared structure

It is possible to modify the MIXTREE algorithm so as to constrain the

trees to share the same structure, and thereby estimate MTSS models. The E step remains unchanged. The only novelty is the reestimation of the tree distributions

in the M step, since they are now constrained to have the same structure. Thus, the maximization cannot be decoupled into

separate tree estimations but, remarkably enough, it can still be performed efficiently. It can be readily verified that for any given structure the optimal parameters of each tree edge $T^k_{uv}$ are equal to the parameters of the corresponding marginal distribution $P^k_{uv}$ . It remains only to find the optimal structure. The expression to be optimized is the second sum in the right-hand side of equation (7). By replacing $T^k_{uv}$ with $P^k_{uv}$ and denoting the mutual information between

and

under

by $I^k_{uv}$ this sum can be reexpressed as follows:

$\displaystyle \sum_{k=1}^m \Gamma_k \sum_{i=1}^N P^k(x^i)\log T^k(x^i)$	$\textstyle =$	$\displaystyle N\sum_{k=1}^m \lambda_k [\sum_{(u,v)\in E}I^k_{uv} - \sum_{v \in V} H(P^k_v)]$
	$\textstyle =$	$\displaystyle N\sum_{(u,v)\in E}I_{uv\vert z} - N\sum_{{v \in V}}H(v\vert z).$	(8)

The new quantity $I_{uv\vert z}$ appearing above represents the mutual information of

and

conditioned on the hidden variable

. Its general definition for three discrete variables

distributed according to $P_{uvz}\equiv P$ is

$\begin{displaymath} I_{uv\vert z} \;=\; \sum_{x_z} P_z(x_z) \sum_{x_ux_v} P_{uv... ..._z)}{P_{u\vert z}( x_u\vert x_z)P_{v\vert z}( x_v\vert x_z)}. \end{displaymath}$

The second term in (8), $N\sum_{{v \in V}}H(v\vert z)$ , represents the sum of the conditional entropies of the variables given

and is independent of the tree structure. Hence, the optimization of the structure

can be achieved by running a MWST algorithm with the edge weights represented by $I_{uv\vert z}$ . We summarize the algorithm in Figure 7.

**Figure 7:** The MIXTREESS algorithm for learning MTSS models.
$\begin{figure} \begin{tabular}{l} \hline \\ Algorithm {\sc MixTreeSS}$({\ca... ...k,\;\lambda^k,\;k=1,\ldots m\}$ \\ \\ \hline \end{tabular} \end{figure}$

Next: Decomposable priors and MAP Up: Mixtures of trees Previous: Running time

Journal of Machine Learning Research 2000-10-19