EBayes Version 0.1

Embedded Bayesian Networks

Fabio Gagliardi Cozman fgcozman@usp.br http://www.cs.cmu.edu/~fgcozman/home.html Escola Politécnica University of São Paulo ©Fabio Cozman, 1998-1999

Thanks for the visit; you're visitor [count] since August 24 1999.

Introduction

EBayes is a full engine for manipulation of Bayesian networks, aimed at applications that can afford minimal storage of executable code. The engine can handle arbitrary networks and can produce inferences, expected values and explanations (maximum a posteriori). The engine is based on the JavaBayes system, a complete system for creation and manipulation of Bayesian networks. The EBayes engine has all the basic capabilities of the engine in JavaBayes, but has been quite optimized in a variety of ways.

The goal of the EBayes project is to create an infrastructure for general manipulation of Bayesian networks in small devices, such as the little embedded computers that control our printers, televisions, refrigerators, cellular phones and cars. Endowing small devices with some ability to reason about uncertainty creates an enormous number of potential applications to Bayesian networks. To a certain extent, the goal of generating embedded Bayesian networks can already be attained with existing tools, as several commercial products offer libraries containing engines for manipulation of Bayesian networks. But most of these libraries are relatively large compared to the resources of embedded computers. The idea of the EBayes project is that Bayesian networks can be effectively used from small to large embedded computers as long as an appropriate engine is produced.

EBayes is in development and is distributed as a jar file (an archive of Java bytecodes). At the moment, you can download:

Note that EBayes is distributed as bytecodes that are executable in the standard Java Virtual Machine as specified by Sun Microsystems Inc. Modification of the code to generate other types of executables or non-standard bytecode is not allowed. This is emphasized so that EBayes is always distributed as a portable, architecture-neutral system.

Playing with EBayes

To run the system, put the ebayes.jar archive and the EBayes.class file in your classpath and run:

java EBayes

Alternatively, you can use the same command in the same directory as ebayes.jar and Ebayes.class.

You can also run

java EBayes batch.txt

where batch.txt is a text file containing a sequence of EBayes commands. For example, a batch file can be as follows:

l DogProblem o family-out true i dog-out x q

To use EBayes, the first thing you have to do is to load a network into the system with the command:

>> l network-class-name

Note that every network is written as a java class file and compiled into bytecodes. See the example DogProblem.java and DogProblem.class. To load the DogProblem network into EBayes, type:

>> l DogProblem

You can then use a number of commands. You can set variables as observed (and set then as not observed), you can ask the posterior marginals, posterior expected values for variables in the network, as ask for explanations.

If you want to play with the Alarm network, just compile the Alarm.java into a class file using a java compiler:

javac Alarm.java

After compiling the network, you can load it into the system:

>> l Alarm

Understanding the EBayes engine

To get the information you need to interact with the EBayes engine, take a look at the JavaDoc documentation for the relevant packages. The best way to understand the system now is to study the EBayes program and the JavaDoc documentation. Every class in the JavaDoc documentation is available in the ebayes.jar archive. A whole Bayesian network system can be built using these classes if needed; the EBayes.class program is a simple demonstration of what can be achieved with the EBayes engine.

Take a look at the EBayes program to understand how to generate posterior marginals, posterior expected values and explanations with the EBayes engine.

When you ask for an expected value, you get a result as follows:

If the values of the variable are strings representing numbers, say "1", "2", etc, then you get the expected value of the variable assuming the numbers are the values of the variable.
If the values of the variable are non-numeric strings, then the first value has value 0, the second has value 1, and so on.

When you ask for an explanation, you get the configuration of variables that produces the highest probability value of the evidence (all observed variables). There are two possible situations:

If you ask for a full explanation, then you obtain the configuration of all variables that maximizes the probability of the evidence (regardles of which variables are "explanatory").
If you set a number of variables as "explanatory" and ask for an explanation, then you get the configuration of explanatory variables that maximizes the probability of the evidence; all non-explanatory non-observed variables are averaged out.

EBayes can produce marginal distributions and expectations using two different algorithms: variable elimination and bucket tree elimination. In the first case, inferences are generated from scratch for each query; in the second case, a data-structure (bucket tree) is generated once and several queries can be generated directly from the bucket tree. Variable elimination consumes less memory, but it may take longer if several queries are made to the same network with the same collection of observations. The various options regarding choice of algorithms are not documented at this point.

There are three types of objects that must be understood by an EBayes developer:

BayesNet objects, representing Bayesian networks. A BayesNet object is built using the add method; using add, you can input an array of DiscreteVariable objects and an array of DiscreteFunction objects.
DiscreteVariable objects, representing the categorical variables in a network (variables with a finite number of values). You define a variable by its name and its values (each value is a String; numeric variables also have values represented by Strings).
DiscreteFunction objects, representing the probability mass functions associated with a network. Each DiscreteVariable is associated with a DiscreteFunction, and each DiscreteFunction is associated with an array of values that represent the probability mass function.

The probability mass function in a DiscreteFunction is represented as follows. First take the variable whose distribution is defined. Then place the conditioning variables to the right of the first variable, in the order they were input. The rightmost variable is the less significant; i.e., its values change fast as you move in the array of values.

Consider the following example, taken from the DogProblem network:

DiscreteVariable bowel_problem = new DiscreteVariable("bowel-problem", DiscreteVariable.CHANCE, new String[] {"true","false"});

DiscreteVariable dog_out = new DiscreteVariable("dog-out", DiscreteVariable.CHANCE, new String[] {"true","false"});

DiscreteVariable family_out = new DiscreteVariable("family-out", DiscreteVariable.CHANCE, new String[] {"true","false"});

DiscreteFunction p3 = new DiscreteFunction( new DiscreteVariable[] {dog_out}, new DiscreteVariable[] {bowel_problem, family_out}, new double[] {0.99, 0.97, 0.9, 0.3, 0.01, 0.03, 0.1, 0.7});

Three binary variables are first defined (called bowel-problem, dog-out and family-out. Then a probability mass function is defined for dog-out conditional on bowel-problem and family-out. The first probability value, 0.99, corresponds to p(dog-out=true|bowel-problem=true, family-out=true). The second probability value, 0.97, corresponds to p(dog-out=true|bowel-problem=true, family-out=false). And so on; the last probability value, 0.7, corresponds to p(dog-out=false|bowel-problem=false, family-out=true).

Conclusion

Bayesian networks have been used as a fundamental tool for the representation and manipulation of beliefs in Artificial Intelligence. EBayes is the first implementation of a Bayesian network engine that is explicitly focused on the characteristics of the embedded market. Being written in Java, EBayes can exploit some of the important characteristics of this language. It is truly portable to any processor that accepts a Java Virtual Machine, and it can work through the internet without modifications. The size of the system is quite small, and its dynamic linking behavior leads to important savings in memory.

An illustrative example of an embedded Bayesian network would be the self-diagnostic ability of a smart refrigerator. Acting autonomously, such a refrigerator could collect data from a variety of sensors and prepare itself for a technician's visit in the event of a failure. The phenomenal growth in popularity of embedded computers and the increase in computer power and memory indicate that such an idea could become reality soon. The piece that is lacking in this picture is the engine that can handle uncertain reasoning inside an embedded computer. This is the gap to be filled by EBayes.

Good luck with the system; I hope it proves useful to your needs.

Cheers,

Fabio G. Cozman