Next: Probabilistic Decision Trees for Up: General Dependency Networks Previous: General Dependency Networks

## Definition and Basic Properties

Given a domain of interest having a set of finite variables , let be a set of conditional distributions, one for each variable in . In normal use, these distributions are intended to correspond to classifications/regressions learned from a single data set but, formally, they can be any set of conditional distributions. We do not require that these distributions be consistent--that is, we do not require that they can be obtained via inference from a single joint distribution . A dependency network for and is a pair where is a (usually cyclic) directed graph and is a set of conditional probability distributions satisfying

for every in . Again, we call the set of conditional distributions the local probability distributions for the dependency network. In most graphical modeling research, a graphical model is used to define a joint distribution for its variables. We do the same using a dependency network. In particular, the procedure of the ordered Gibbs sampler described in the previous section, when applied to a dependency network, yields a joint distribution for the domain. The result holds whether or not the local distributions in the dependency network are consistent. More formally, we have the following theorem, which is established by tracing the proof of Theorem 2 given in the previous section and noting that it does not rely on the consistency of the distributions.

Theorem 3: The procedure of an ordered Gibbs sampler applied to a dependency network for and , where each is finite (and hence discrete) and each local distribution in is positive, defines a Markov chain with a unique stationary joint distribution for that can be reached from any initial state of the chain.

Technically speaking, the procedure of Gibbs sampling applied to an inconsistent dependency network is not itself a Gibbs sampler, because there is no joint distribution consistent with all the local distributions. Consequently, we call this procedure an ordered pseudo-Gibbs sampler. One rather disturbing property of this procedure is that it produces a joint distribution that is likely to be inconsistent with many of the conditional distributions used to produce it. Nonetheless, as we have suggested and shall examine further, when each of the local distributions of a dependency network are learned from the same data, these distributions should be almost consistent with the joint distribution. Another disturbing observation is that the joint distribution obtained will depend on the order in which the pseudo-Gibbs sampler visits the variables. For example, consider the dependency network with the structure , saying that depends on but does not depend on . If we draw sample-pairs --that is, and then --then the resulting stationary distribution will have and dependent. In contrast, if we draw sample-pairs , then the resulting stationary distribution will have and independent. Due to the near consistency of the local distributions, however, the joint distributions obtained from different orderings will be close. If we indeed discover the dependency-network structure in practice, then and must be almost'' independent. Given a graphical model for a domain and an ordered Gibbs sampler that extracts a joint distribution from that model, we can apply the rules of probability to this joint distribution to answer probabilistic queries. Alternatively, we can apply Algorithm 1 directly to a given dependency network. Such an application may yield different answers than those computed from the joint distribution obtained from the dependency network, but can be justified heuristically due to near consistency. In Sections 3.3 and 3.4, we examine the issue of near consistency more carefully. In the remainder of this section, we mention two basic properties of dependency networks. First, let us consider the distributions that can be represented by a general dependency-network structure. Unlike the situation for consistent dependency networks, a general dependency-network structure and a Markov network structure with the same adjacencies do not represent the same distributions. In fact, a dependency network with a given structure defines a larger set of distributions than a Markov network with the same adjacencies. As an example, consider the dependency-network structure . In a simple experiment, we sampled local distributions for this structure from a uniform distribution. We then computed the stationary distribution of the Markov chain defined by a pseudo-Gibbs sampler with variable order . In all runs, we found that and were conditionally dependent given in the stationary distribution. In a Markov network with the same adjacencies, and must be conditionally independent given . Second, let us consider a simple necessary condition for consistency. We say that a dependency network for is bi-directional if is a parent of if and only if is a parent of , for all and in . In addition, let be the parent of node . We say that a consistent dependency network is minimal if and only if, for every node and for every parent , is not independent of given the remaining parents of . With these definitions, we have the following theorem, proved in the Appendix.

Theorem 4: A minimal consistent dependency network for a positive distribution must be bi-directional.

Next: Probabilistic Decision Trees for Up: General Dependency Networks Previous: General Dependency Networks
Journal of Machine Learning Research, 2000-10-19