Feudal Q-learning

Next: Compositional Q-learning Up: Hierarchical Methods Previous: Hierarchical Methods

Feudal Q-learning

Feudal Q-learning [31, 128] involves a hierarchy of learning modules. In the simplest case, there is a high-level master and a low-level slave. The master receives reinforcement from the external environment. Its actions consist of commands that it can give to the low-level learner. When the master generates a particular command to the slave, it must reward the slave for taking actions that satisfy the command, even if they do not result in external reinforcement. The master, then, learns a mapping from states to commands. The slave learns a mapping from commands and states to external actions. The set of ``commands'' and their associated reinforcement functions are established in advance of the learning.

This is really an instance of the general ``gated behaviors'' approach, in which the slave can execute any of the behaviors depending on its command. The reinforcement functions for the individual behaviors (commands) are given, but learning takes place simultaneously at both the high and low levels.

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996