A motivation for this repository is outlined in "Challenge: Where is the impact of Bayesian networks in learning?" by N. Friedman, M. Goldszmidt, D. Heckerman, and S. Russell (IJCAI-97).
This will be achieved by several progressive steps:
Sharing domains. This would allow for reproduction of results, and also allow researchers in the community to run large scale empirical tests.
Sharing task specification. Sharing domains is not enough to compare algorithms. Thus, even if two papers examine inference in particular network, they might be answering different queries or assuming different evidence sets. The intent here is to store specific tasks. For example, in inference this might be a specific series of observations/queries. In learning, this might be a particular collection of training sets that have a particular pattern of missing data.
Sharing task evaluation. Even if two researchers examine the same task, they might use different measures to evaluate their algorithms. By sharing evaluation methods, we hope to allow for an objective comparison. In some cases such evaluation methods can be shared programs, such as a program the evaluates the quality of learned model by computing KL divergence to the "real" distribution. In other cases, such an evaluation method might be an agreed upon evaluation of performance, such as space requirements, number of floating point operations, etc.
Organized competitions. One of the dangers of empirical research is that the methods examined become overly tuned to specific evaluation domains. To avoid that danger, it is necessary to use "fresh" problems. The intention is to organize competitions that would address a specific problems, such as causal discovery, on unseen domains.
Currently, this site contains several domains. The plan is to gradually add other components discussed above.
Please send suggestions and contributions to galel@cs.huji.ac.il.
Thanks to Fabio Cozman, Bruce D'Ambrosio, Moises Goldszmidt, David Heckerman, Othar Hansson, Daphne Koller, and Stuart Russell for discussions about the organization of this site. Thanks to John Binder, Jack Breese, David Heckerman, Uffe Kjaeruff, and Mark Peot, for contributing networks.
Name | Suffix | Description |
Bayesian Interchange Format | .bif | The proposed interchange format. I am following Fabio Cozman's version of the format, which is similar to the original proposal. |
MSBN | .dsc | Microsoft's BN tool format. See the MSBN page. |
Hugin | .hugin | File format used by the HUGIN BN tool. |
Ideal | .ideal | A format that is based on the one used in the IDEAL toolkit. |
Ergo file format | .ergo | File format used by the ERGO BN tool. |
The following table summarizes the networks the repository.
Name | Description |
Alarm | Monitoring of emergency care patients |
Barley | Model of Barley crops yield. |
Carpo | Diagnosis of Carpal Tunnel Syndrome |
Diabetes | A model for insulin dose adjustment (DBN). |
HailFinder | Predicting Hails in northern Colorado. |
Insurance | Evaluating insurance applications |
Link | Pedigree for linkage analysis. |
Mildew | A model for deciding on the amount of fungicides to be used against attack of mildew in wheat. |
Munin | An expert electromyography assistant. |
Pigs | Pedigree of breeding pigs. |
PathFinder | Analysis of Lymph cell pathologies. |
Water | A model of the biological processes of a water purification plant. |
Win95pts | A model for printer troubleshooting in Microsoft Windows 95. |
Bayesian Networks and related issues:
Data and Machine Learning Repositories:
The Decision Support Systems Group at Aalborg University have pointers to several large networks.
The Reinforcement learning repository at Michigan State University.
The UC Irvine repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
The StatLib site, a system for distributing statistical software, datasets, and information by electronic mail, FTP and WWW.
The DELVE project, a standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data.
The StatLog project is concerned with comparative studies of different machine learning, neural and statistical classification algorithms
The Abbadingo competition for grammar induction.
Last modified: Thursday, February 12, 1998