**Theorem 2:** An ordered Gibbs sampler applied to a
consistent dependency network for , where each is finite
(and hence discrete) and each local distribution
is
positive, defines a Markov chain with a unique stationary joint
distribution for equal to that can be reached from any
initial state of the chain.

**Proof:** Let be the sample of after the
iteration of the ordered Gibbs sampler. The sequence
can be viewed as samples drawn from a homogenous
Markov chain with transition matrix having elements
. The matrix is the product
, where is the ``local''
transition matrix describing the resampling of according to the
local distribution
. The positivity of local
distributions guarantees the positivity of , which in turn
guarantees the irreducibility of the Markov chain. Consequently,
there exists a unique joint distribution that is stationary with
respect to . The positivity of also guarantees that this
stationary distribution can be reached from any starting point. That
this stationary distribution is equal to is proved in the
Appendix.

After a sufficient number of iterations, the samples in the
chain will be drawn from the stationary distribution for . We
use these samples to estimate . There is much written about
how long to run the chain before keeping samples (the ``burn in'') and
how many samples to use to obtain a good estimate (e.g., Gilks,
Richardson, and Spiegelhalter, 1996). We do not discuss
the details here.
Next, let us consider the use of Gibbs sampling to compute
for particular instances and where and
are arbitrary disjoint subsets of . A naive approach uses
an ordered Gibbs sampler directly. During the iterations, only
samples of where
are used to compute the conditional
probability. An important difficulty with this approach is that if
either
or is small (a common occurrence when
or contain many variables), many iterations are required
for an accurate probability estimate.
A well-known approach for estimating
when is
small is to fix
during ordered Gibbs sampling. It is not
difficult to generalize the proof of Theorem 2 to show that this *
modified ordered Gibbs sampler* has a stationary distribution and
yields the correct conditional probability given a consistent
dependency network.
When is rare because contains many variables, we can use
the independencies encoded in a dependency network along with
the law of total probability to decompose the inference task into a
set of inference tasks on single variables. For example, consider the
dependency network
. Given the independencies specified by this network,
we have

and can obtain by computing each term separately. The determination of the first term requires no Gibbs sampling--the distribution can be read directly from the local distribution for in the dependency network. The second two terms can be determined using modified ordered Gibbs samplers each with a singleton target ( and , respectively). Note that this approach has the additional advantage that some terms may be obtained by direct lookup, thereby avoiding some Gibbs sampling. In general, given a consistent dependency network for and disjoint sets of variables and , we can obtain for a particular instance of and as follows:

Algorithm 1:

1 (* the unprocessed variables *)

2 (* the processed and conditioning variables *)

3 (* the values for *)

4 While

5 Choose such that has no more parents in than any variable in

6 If all the parents of are in

7

8 Else

9 Use a modified ordered Gibbs sampler to determine

10

11

12

13 Return the product of the conditionals

The key step in this algorithm is step 7, which bypasses
Gibbs sampling when it is not needed. This step is justified by
Equation 1.