next up previous
Next: Variable Resolution Dynamic Programming Up: Adaptive Resolution Models Previous: Adaptive Resolution Models

Decision Trees
In environments that are characterized by a set of boolean or discrete-valued variables, it is possible to learn compact decision trees for representing Q values. The G-learning algorithm [21], works as follows. It starts by assuming that no partitioning is necessary and tries to learn Q values for the entire environment as if it were one state. In parallel with this process, it gathers statistics based on individual input bits; it asks the question whether there is some bit b in the state description such that the Q values for states in which b=1 are significantly different from Q values for states in which b=0. If such a bit is found, it is used to split the decision tree. Then, the process is repeated in each of the leaves. This method was able to learn very small representations of the Q function in the presence of an overwhelming number of irrelevant, noisy state attributes. It outperformed Q-learning with backpropagation in a simple video-game environment and was used by McCallum [74] (in conjunction with other techniques for dealing with partial observability) to learn behaviors in a complex driving-simulator. It cannot, however, acquire partitions in which attributes are only significant in combination (such as those needed to solve parity problems).



Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996