With this prior, the expression to be maximized in the M step of the EM algorithm becomes

Consequently, each edge weight in tree is adjusted by the corresponding value divided by the total number of points that tree is responsible for:

A negative increases the probability of being present in the final solution, whereas a positive value of acts like a penalty on the presence of edge in the tree. If the MWST procedure is modified so as to not add negative-weight edges, one can obtain (disconnected) trees having fewer than edges. Note that the strength of the prior is inversely proportional to , the total number of data points assigned to mixture component . Thus, with equal priors for all trees , trees accounting for fewer data points will be penalized more strongly and therefore will be likely to have fewer edges. If one chooses the edge penalties to be proportional to the increase in the number of parameters caused by the addition of edge to the tree,

then a

where is a distance metric between Bayes net structures and is the prior network structure. Thus, this prior penalizes deviations from the prior network. This prior is decomposable, entailing

Decomposable priors on structure can also be used when the structure is common for all trees (MTSS). In this case the effect of the prior is to penalize the weight in (8) by . A decomposable prior has the remarkable property that its normalization constant can be computed exactly in closed form. This makes it possible not only to completely define the prior, but also to compute averages under this prior (e.g., to compute a model's evidence). Given that the number of all undirected tree structures over variables is , this result [Meila, Jaakkola 2000] is quite surprising.