Next: Bibliography Up: Dependency Networks for Inference, Previous: Acknowledgments

# Appendix: Proofs of Theorems

Theorem 1: The set of positive distributions that can be encoded by consistent dependency networks with graph is equal to the set of positive distributions that can be encoded by Markov networks whose structure have the same adjacencies as those in .

Proof: Let be a positive distribution defined by a Markov network where are the adjacencies of , . Construct a consistent dependency network from by extracting the conditional distributions . Because these probabilities came from the Markov network, we know that so that the adjacencies in the dependency-network structure are the same as those in the Markov network. Now let be a positive distribution encoded by a consistent dependency network. By definition, we have that is independent of given , . Because is positive and these independencies comprise the global Markov property of a Markov network with , , the Hammersley-Clifford theorem (Besag, 1974; Lauritzen, Dawid, Larsen, and Leimer, 1990) implies that can be represented by this Markov network.

Theorem 2: An ordered Gibbs sampler applied to a consistent dependency network for , where each is finite (and hence discrete) and each local distribution is positive, defines a Markov chain with a unique stationary joint distribution for equal to that can be reached from any initial state of the chain.

Proof: In the body of the paper, we showed that the Markov chain can be described by the transition matrix , where is the local'' transition matrix describing the resampling of according to the local distribution . We also showed that this Markov chain has a unique joint distribution that can be reached from any starting point. Here, we show that is that stationary distribution--that is, , where . To do so, we show that for each , .

 (11) (12) (13) (14) (15) (16)

Equation 12 follows from Equation 11 using the definition of . Equation 13 follows from Equation 12 using the definition of a consistent dependency network. Equation 14 follows from Equation 13 by observing that the value of and can differ only in variable . The other steps are straightforward.

Theorem 4: A minimal consistent dependency network for a positive distribution must be bi-directional.

Proof: We use the graphoid axioms of Pearl (1988). Suppose the theorem is false. Then, there exists nodes and such that is a parent of and is not a parent of . Let , , and . From minimality, we know that does not hold. By decomposition, does not hold. Given positivity, the intersection property holds. By the intersection property, at least one of the following conditions does not hold: (1) , (2) . (If , then condition 2 holds vacuously.) Now, from Theorem 2, we know that . Using weak union, we have that --that is, condition 1 holds. Also, from Theorem 2, we know that --that is, condition 2 holds, yielding a contradiction.

Next: Bibliography Up: Dependency Networks for Inference, Previous: Acknowledgments
Journal of Machine Learning Research, 2000-10-19