next up previous
Next: Acknowledgments Up: Dependency Networks for Inference, Previous: Data Visualization


Summary and Future Work

We have described a graphical representation for probabilistic dependencies similar to the Bayesian network called a dependency network. Like a Bayesian network, a dependency network has a graph and a probability component. In its consistent form, the graph component is a cyclic directed graph such that a node's parents render that node independent of all other nodes in the network. As in a Bayesian network, the probability component consists of the probability of a node given its parents for each node--the local distributions. In practice, for computational reasons, we learn the structure and parameters of a dependency network for a given domain by independently performing a classification/regression for each variable in the domain with inputs consisting of all variables except the target variable. The parameterized model for each variable is the local distribution for that variable; and the structure of the network reflects any independencies discovered in the classification/regression process (via feature selection). As a result of this learning procedure, the dependency network is usually inconsistent--that is, it is not the case that the local distributions can be obtained via inference from a single joint distribution for the domain. Nonetheless, because each local distribution is learned from the same data, the local distributions are ``almost'' consistent when there is adequate data. Consequently, as a useful heuristic, we can apply the machinery of Gibbs sampling to this network to extract a joint distribution for the domain and to answer probabilistic queries. Experiments on real data show this approach to yield accurate predictions. In addition to their application to probabilistic inference, we have shown that dependency networks are useful for collaborative filtering (the task of predicting preferences) and for the visualization of acausal predictive relationships. In fact, Microsoft has included dependency networks in two of its products--SQL Server 2000 and Commerce Server 2000--for both the collaborative filtering and data visualization tasks. The intent of our paper has been to introduce the basic concepts and applications of dependency networks. Consequently, there is significant additional work to be done. For example, many of the results described in this paper can be extended to domains that include continuous variables. In addition, more work is needed to characterize those situations in which the joint distribution defined by an (inconsistent) dependency network is insensitive to errors in the learned local distributions. As another example, experimental work is needed to examine the predictive accuracy of dependency networks across a variety of domains using alternative methods for classification and regression. It may also be useful to consider pseudo-Gibbs sampling methods that resample variables in random rather than fixed order. Finally, we note that the representation itself can be generalized. Recall that a dependency network is useful for collaborative filtering primarily because the network stores in its local distributions precisely the probabilistic quantities needed by the ranking algorithm. In general, we can construct a ``query network'' that directly learns probabilities corresponding to a set of given queries. As an illustration, suppose we have a domain consisting of variables $W,X,Y,$ and $Z$, and we know we will be answering the query $p(w,x\vert y,z)$. We can learn this distribution directly from data by performing a series of (independent) classifications/regressions. We can construct the classifications/regressions $p(w\vert x,y,z)$ and $p(x\vert y,z)$ and use multiplication to answer the query. Alternatively, we may construct the classifications/regressions $p(w\vert x,y,z)$ and $p(x\vert w,y,z)$ and use a pseudo-Gibbs sampler to answer the query. In either case, with sufficient data, the conditional probabilities learned will be ``almost'' consistent with the true distribution, and are likely to produce accurate answers to the query.
next up previous
Next: Acknowledgments Up: Dependency Networks for Inference, Previous: Data Visualization
Journal of Machine Learning Research, 2000-10-19