Bayesian Network Repository

About This Page
Networks
- Network formats
- Available networks
Related Sites

This page is a preliminary version of a planned UAI repository. The intention is to construct a repository that will allow us to empirical research within our community by facilitating (1)better reproducibility of results, and (2) better comparisons among competing approach. Both of these are required to measure progress on problems that are commonly agreed upon, such as inference and learning.

A motivation for this repository is outlined in "Challenge: Where is the impact of Bayesian networks in learning?" by N. Friedman, M. Goldszmidt, D. Heckerman, and S. Russell (IJCAI-97).

This will be achieved by several progressive steps:

Sharing domains. This would allow for reproduction of results, and also allow researchers in the community to run large scale empirical tests.

Sharing task specification. Sharing domains is not enough to compare algorithms. Thus, even if two papers examine inference in particular network, they might be answering different queries or assuming different evidence sets. The intent here is to store specific tasks. For example, in inference this might be a specific series of observations/queries. In learning, this might be a particular collection of training sets that have a particular pattern of missing data.

Sharing task evaluation. Even if two researchers examine the same task, they might use different measures to evaluate their algorithms. By sharing evaluation methods, we hope to allow for an objective comparison. In some cases such evaluation methods can be shared programs, such as a program the evaluates the quality of learned model by computing KL divergence to the "real" distribution. In other cases, such an evaluation method might be an agreed upon evaluation of performance, such as space requirements, number of floating point operations, etc.

Organized competitions. One of the dangers of empirical research is that the methods examined become overly tuned to specific evaluation domains. To avoid that danger, it is necessary to use "fresh" problems. The intention is to organize competitions that would address a specific problems, such as causal discovery, on unseen domains.

Plans for the future

Currently, this site contains several domains. The plan is to gradually add other components discussed above.

Please send suggestions and contributions to galel@cs.huji.ac.il.

Acknowledgements

Thanks to Fabio Cozman, Bruce D'Ambrosio, Moises Goldszmidt, David Heckerman, Othar Hansson, Daphne Koller, and Stuart Russell for discussions about the organization of this site. Thanks to John Binder, Jack Breese, David Heckerman, Uffe Kjaeruff, and Mark Peot, for contributing networks.

Networks

Network formats

The networks in the repository come in several different formats. I am trying to construct translators among the different formats, but translations are incomplete. The available formats are:

Name	Suffix	Description
Bayesian Interchange Format	.bif	The proposed interchange format. I am following Fabio Cozman's version of the format, which is similar to the original proposal.
MSBN	.dsc	Microsoft's BN tool format. See the MSBN page.
Hugin	.hugin	File format used by the HUGIN BN tool.
Ideal	.ideal	A format that is based on the one used in the IDEAL toolkit.
Ergo file format	.ergo	File format used by the ERGO BN tool.

Available networks

The following table summarizes the networks the repository.

Name	Description
Alarm	Monitoring of emergency care patients
Barley	Model of Barley crops yield.
Carpo	Diagnosis of Carpal Tunnel Syndrome
Diabetes	A model for insulin dose adjustment (DBN).
HailFinder	Predicting Hails in northern Colorado.
Insurance	Evaluating insurance applications
Link	Pedigree for linkage analysis.
Mildew	A model for deciding on the amount of fungicides to be used against attack of mildew in wheat.
Munin	An expert electromyography assistant.
Pigs	Pedigree of breeding pigs.
PathFinder	Analysis of Lymph cell pathologies.
Water	A model of the biological processes of a water purification plant.
Win95pts	A model for printer troubleshooting in Microsoft Windows 95.

Related Sites

Bayesian Networks and related issues:

The homepage of the Association for Uncertainty in Artificial Intelligence.

Russell Almond's directories of software for manipulating Belief Networks and learning Belief Networks from data.

Data and Machine Learning Repositories:

The Decision Support Systems Group at Aalborg University have pointers to several large networks.
The Reinforcement learning repository at Michigan State University.
The UC Irvine repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
The StatLib site, a system for distributing statistical software, datasets, and information by electronic mail, FTP and WWW.
The DELVE project, a standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data.
The StatLog project is concerned with comparative studies of different machine learning, neural and statistical classification algorithms
The Abbadingo competition for grammar induction.

galel@cs.huji.ac.il

Last modified: Thursday, February 12, 1998