15-887*/16-830: Planning, Execution, and Learning
Schedule, Notes, Readings
Fall 2008
- Monday, September 8: Introduction
- Lecture Notes
- Points to remember from the discussion in class:
- Planning is about states, goals, actions. Planning assumes a model of the world; Execution performs
the actions of a plan; Learning improves planning and execution through experience.
- Different planning problems require different assumptions
about state and actions, which can make a problem easier or harder to solve.
- No assigned readings besides lecture notes.
- Wednesday, September 10: Representation and Search
- Lecture Notes
- Points to remember from the discussion in class:
- State space grows exponentially in number of state variables
- Factoring / decomposing state space key to planning efficiency
- Strips-assumption (add/delete lists) deals with Frame Problem
- Writing a planning domain in terms of a set of predicates,
variables, and operators is a hard task.
- Planners characterized in terms of soundness, completeness,
optimality, and computational efficiency
- Many ways to find solutions to planning problems -- differ
in complexity and how well they scale with respect to size
of search space
- Search for solutions to planning problems can occur in
both state space and plan space
- No assigned readings besides lecture notes.
- Monday, September 15: Plan-Space Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Planning in place space: State space is space of partial
plans; search actions modify plans (refine them)
- Partial-Order planners work by adding causal links from effects
of one action to precondition of another, and then by eliminating
threats. Threats can be eliminated by promotion, demotion, or
separation. Backtracking occurs if a threat cannot be eliminated.
- Use of Least Commitment -- delaying choices of temporal
constraints and parameter bindings unless forced.
- Readings:
- Wednesday, September 17: State-space Non-linear Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Linear planning maintains state and a goal stack: it's efficient but can
generate unoptimal solutions, and it's incomplete.
- Nonlinear state-space planning uses a set of goals, backtracks over
plan orderings decisions. It's complete.
- State-space planning commits on plan step orderings
with the main goal of getting an updated state and hopefully
a "closer" state to the goal. Such commitments may lead to
backtracking, but may also lead to informed state-based choices
of actions.
- Plan-space planning does not commit on plan step orderings,
until it needs to solve threats. It
follows a least-commitment search
approach but only wrt step orderings.
- Plan-space planning commits to causal links
leading to linkability problems and to the need to backtrack
over linking commitments. In the linkability sense,
plan-space planning is a strong-commitment planner.
- Readings:
-
Integrating planning and learning: The Prodigy architecture, (Sections 1 and 2)
Manuela M. Veloso, Jaime Carbonell, M. Alicia P\'erez, Daniel Borrajo,
Eugene Fink, and Jim Blythe.
Journal of Experimental and Theoretical
Artificial Intelligence, 7(1):81--120, 1995.
-
Linkability: Examining causal link commitments in partial-order
planning,
Manuela M. Veloso and Jim Blythe.
In Proceedings of the Second International Conference on AI
Planning Systems, pages 170-175, June 1994.
- Monday, September 22: Graph-based Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Wednesday, September 24: Heuristic Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Estimate cost of achieving conditions/goals using relaxed
form of problem
- Good heuristics make forward-progression state-space planners
very competitive
- Hard to find an informed, admissible heuristic for planning due
to positive and negative interactions
- Many heuristics for flaw selection -- no one is best
- Readings:
-
Planning as Heuristic Search, B. Bonet and H. Geffner. Artificial Intelligence, Special issue on Heuristic Search. Vol 129 (1-2) 2001.
-
Admissible Heuristics for Optimal Planning P. Haslum H. Geffner. Proc. 5th
Int. Conf. on AI Planning and Scheduling (AIPS 2000).
-
The FF Planning System: Fast Plan Generation Through Heuristic Search,
J. Hoffmann, B. Nebel. Journal of Artificial Intelligence Research,
vol 14, pp. 253-302, 2001
-
VHPOP: Versatile Heuristic Partial Order Planner,
H.L.S. Younes and R.G. Simmons. Journal of Artificial Intelligence
Research, vol 20, pp. 405-430, December 2003.
- Monday, September 29: Hierarchical Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Problem reduction search decomposes tasks into subtasks
- Take advantage of structure inherent in domain
- Domain definition -- "how" rather than "what"
- Works best when problem is complex, but few unintended
goal interactions
- Readings:
-
SHOP2: An HTN Planning System
Nau, D.S., Au, T.C., Ilghami, O., Kuter, U., Murdock, J.W., Wu, D. and
Yaman, F, Volume 20, pages 379-404, 2003.
-
O-Plan: the Open Planning Architecture,
Ken Currie and Austin Tate.
Artificial Intelligence Vol. 52, pp 49-86, 1991
-
HTN planning: Complexity and expressivity,
K Erol, J Hendler and D Nau. AAAI-94, 1994
- Wednesday, October 1: Classical Planning and Learning I
- Lecture Notes
- Points to remember from the discussion in class:
- Control learning "explains" a search episode and outputs
an explanation that can be used in future search to avoid failure.
- Generating "correct" explanations may be expensive,
an alternative is to acquire explanations incrementally.
- Readings:
- Monday, October 6: Classical Planning and Learning II
- Lecture Notes
- Points to remember from the discussion in class:
- Planning can be improved in terms of search efficiency.
- Planning by analogy replays past plans.
- Readings:
- Flexible strategy
learning: Analogical replay of problem solving episodes,
Manuela M. Veloso.
In Proceedings of AAAI-94, the Twelfth National Conference on
Artificial Intelligence, pages 595-600, Seattle, WA, August 1994. AAAI
Press.
-
Integrating planning and learning: The Prodigy architecture, (Sections 3 and 4)
Manuela M. Veloso, Jaime Carbonell, M. Alicia P\'erez, Daniel Borrajo,
Eugene Fink, and Jim Blythe.
Journal of Experimental and Theoretical
Artificial Intelligence, 7(1):81--120, 1995.
-
Learning-Assisted Automated Planning, Terry Zimmerman and SUbbarao Kambhampati,
AI Magazine, Volume 24, Summer 2003.
- Wednesday, October 8: Conditional and Universal Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Conditional planning produces branching plans that use sensing actions
during execution to determine which branch to take
- Classical planning can be extended to deal with disjunctive
uncertainty by using approaches based on "possible worlds"
- Sensing actions can be used to distinguish between different possible worlds
- Tree-based policies can be generated incrementally.
- BDD-based planning proceeds by computing "preimages"
backwards from the goal state. A preimage of a state is
the set of states that can transition to that state.
- There are three BDD-based planning algorithms to
know: strong, strong cyclic, and optimistic.
- Readings:
-
Conditional Nonlinear Planning,
Mark Peot and David Smith
Proceedings of the First International Conference on AI Planning
Systems, College Park, Maryland, 1992
-
Extending Graphplan to Handle Uncertainty and Sensing Actions,
D. Weld, C. Andersonand D. Smith, Proceedings of AAAI, July 1998.
-
Universal Plans for Reactive Robots in Unpredictable Environments,
MJ Schoppers. In Proceedings IJCAI 1987.
-
Strong Planning in Non-Deterministic Domains via Model Checking,
A. Cimatti, M. Rovero, P. Traverso.
In Proceedings of AIPS-98.
-
Automatic OBDD-based Generation of Universal Plans in
Non-Deterministic Domains,
A. Cimatti, M. Rovero, P. Traverso.
In Proceedings of AAAI-98.
-
OBBD-based Universal planning for synchronized agents in non-deterministic
domains, Rune Jensen and Manuela Veloso, Journal of Artificial Intelligence Research, 13,
pages 189-226, 2000.
- Monday, October 13: Markov Decision Processes
- Lecture Notes
- Points to remember from the discussion in class:
- MDPs for planning under uncertainty
- MDPs reason about states and actions and rewards.
- Rewards capture goal achievement.
- A policy maps states to actions. An MDP with a
specific policy is a Markov Chain.
- We have seen value iteration and policy iteration
to solve Marchov Chains and MDPs.
- Readings:
- Wednesday, October 15: Reinforcement Learning
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Chapter 13 - Machine Learning, Tom Mitchell
- Monday, October 20: POMDPs
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
-
POMDPs for Dummies.
-
Planning and Acting in Partially Observable Stochastic Domains.
L.P. Kaelbling, M.L. Littman, and A.R. Cassandra, Artificial Intelligence,
Vol 101, 1998.
-
Synthesis of Hierarchical Finite-State Controllers for POMDPs.
E. Hansen and R. Zhou. Thirteenth International Conference on Automated
Planning and Scheduling (ICAPS-03), Trento, Italy, June 2003.
-
Point-based value iteration: An anytime algorithm for POMDPs.
J. Pineau, G. Gordon, and S. Thrun. In Proceedings IJCAI-03,
Acapulco, Mexico. August 2003.
-
Heuristic Search Value Iteration for POMDPs
T. Smith and R. Simmons. In Proceedings of UAI, July 2004.
- Wednesday, October 22: POMDPs
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Monday, October 27: Reactive Planning, Robot Execution
- Lecture Notes
- Points to remember from the discussion in class:
- Execution involves state assessment and signal processing
and understanding.
- Reactive planning can handle low-level planning, while
deliberative planning can work at a higher-level of abstraction.
- Readings:
- Wednesday, October 29: Path Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Replanning is needed when execution fails.
- Given a plan that failed on a step, two alternatives for replanning
are: to plan from scratch to the goal from the new state; and to plan
from the new state to try reach one of the steps of the past plan. Both
options are not ideal
- ERRT with a search bias to the past plan is the perfect solution
for replanning.
- Readings:
- Monday, November 3: Execution Architectures
- Lecture Notes
- Points to remember from the discussion in class:
- Importance of modularity and hierarchy (temporal, behavioral,
functional)
- Layered architectures very important - upper layers provide goals;
commands sent to lower layers
- Many modern hybrid architectures have three layers - planning,
executive, behavioral - each with different roles and using
different representations
- Readings:
-
Experiences with an Architecture for Intelligent, Reactive Agents,
R. P. Bonasso, R. J. Firby, E. Gat, D. Kortenkamp, D. Miller and M. Slack
Journal of Experimental and Theoretical Artificial Intelligence, 9(2),
1997.
-
Remote Agent: To Boldly Go Where No AI System Has Gone Before,
Nicola Muscettola, P. Pandurang Nayak, Barney Pell, and Brian Williams
Artificial Intelligence 103(1-2):5-48, August 1998.
-
A Task Description Language for Robot Control,
R. Simmons and D. Apfelbaum,
In Proceedings of Conference on Intelligent Robotics and Systems, Vancouver Canada, October 1998.
- Wednesday, November 5: POMDPs (III)
- Lecture Notes
- Points to remember from the discussion in class:
- Can improve POMDP solving performance by approximating value function,
belief space, or both
- Point-based POMDP solvers are (currently) state-of-the-art
- Greedy methods can sometimes give reasonable results, at very nominal
computational cost, but generally do not reason explicitly
about observations.
- Readings:
-
An Improved Grid-Based Approximation Algorithm for POMDPs
R. Zhou and E. Hanse. In Proceedings of IJCAI, Seattle Washington,
August 2001.
-
Point-based value iteration: An anytime algorithm for POMDPs.
J. Pineau, G. Gordon, and S. Thrun. In Proceedings IJCAI-03,
Acapulco, Mexico. August 2003.
-
Heuristic Search Value Iteration for POMDPs
T. Smith and R. Simmons. In Proceedings of UAI, July 2004.
-
Acting optimally in partially observable stochastic domains
A. Cassandra, L.P. Kaelbling, M. Littman. In Proceedings AAAI-94
Seattle WA, 1994.
- Monday, November 10: Multi-agent Planning I
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Social Potentials for Scalable Multirobot Formations,
T. Balch and M. Hybinette, in Proceedings IEEE International Conference
on Robotics and Automation, San Francisco CA, 2000.
- Coordination for Multi-Robot Exploration and Mapping
,
R. Simmons, D. Apfelbaum, W. Burgard, D. Fox, M. Moors, S. Thrun, H. Younes.
In Proceedings National Conference on Artificial Intelligence,
Austin TX, August 2000.
-
A Layered Architecture for Coordination of Mobile Robots,
R. Simmons, T. Smith, M. B. Dias, D. Goldberg, D. Hershberger,
A. Stentz, R. Zlot, in Multi-Robot Systems: From
Swarms to Intelligent Automata,
A. Schultz and L. Parker (eds.), Kluwer, 2002.
-
Hoplites: A Market-Based Framework for Planned Tight
Coordination in Multirobot Teams,
N. Kalra, D. Ferguson, and A. Stentz, in Proceedings of the International
Conference on Robotics and Automation, April, 2005.
- Wednesday, November 12: Multi-robot planning, execution, and some learning
- Lecture Notes
- Points to remember from the discussion in class:
- Multi-robot planning and execution can be done with centralized
perception and control.
- Plays can capture multi-robot plans, and a playbook may have multiple
plays. At execution time, plays can be selected from the playbook
using their applicability conditions and weights that can drive selection
based on the past success of plays. Weights can be adjusted with experience.
- Distributed multi-robot planning can be done by state sharing
through communication. Robots need to translate their own local perception
to global coordinates so that information can be shared among team members.
- Readings:
-
Dynamic multi-robot coordination,
Douglas Vail and Manuela Veloso,
In Multi-Robot Systems: From Swarms to Intelligent Automata,
A. Schultz, L. Parker, and F. Schneider (eds.),
Volume II, pages 87--100. Kluwer Academic Publishers, 2003.
-
Plays as team plans for coordination and adaptation,
Michael Bowling, Brett Browning, and Manuela Veloso.
In Proceedings of the 14th International Conference on Automated
Planning and Scheduling (ICAPS-04), Vancouver, June 2004.
-
Handling diverse information sources: Prioritized multi-hypothesis
world modeling,
Paul R. Rybski and Manuela Veloso.
To appear, EURASIP Journal on Advances in Signal Processing, 2008.
Previous Technical Report CMU-CS-06-182, Computer Science Department, CMU, 2006.
- Monday, November 17: Multi-agent Planning - DEC-POMDPs
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
-
Taming Decentralized POMDPs: Towards Efficient Policy Computation for
Multiagent Settings.
Ranjit Nair, David Pynadath, Makoto Yokoo, Milind Tambe and Stacy Marsella.
In Proceedings of the International Joint Conference on Artificial
Intelligence (IJCAI-03), 2003.
-
Solving Transition Independent Decentralized Markov Decision Processes
Stephen Becker, Shlomo Zilberstein, Victor Lesser, Claudia Goldman.
Journal of Artificial Intelligence Research, vol. 22, pp. 423-455,
2004.
-
Communication for Improving Policy Computation in Distributed POMDPs
Ranjit Nair, Maayan Roth, Makoto Yokoo and Milind Tambe.
In Proceedings of Conference on Autonomous Agents and Multiagent Systems
(AAMAS-04), 2004.
-
Decentralized Communication Strategies for Coordinated Multi-Agent
Policies, M. Roth, R. Simmons and M. Veloso. In
Multi-Robot Systems: From Swarms to Intelligent Automata, A. Schultz, L.
Parker, and F. Schneider (eds.), Volume IV, Kluwer Avademic Publishers, 2005.
-
What to Communicate? Execution-Time Decision in Multi-Agent POMDPs,
M. Roth, R. Simmons and M. Veloso.
In Proceedings of the International
Symposium on Distributed Autonomous Robotic Systems (DARS),
Minneapolis MN, July 2006.
- Wednesday, November 19: Multi-agent Coordination
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Monday, November 24: Planning for Heterogenous Multi-Robot Systems
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Wednesday, November 26: Thanksgiving, No Class
- Monday, December 1: Multi-Robot, Acting Under Uncertainty
- Lecture Notes
- Points to remember from the discussion in class:
- Three main ideas: Multi-fidelity behaviors, Oracular POMDPs, Behavior
recognition from observations.
- Readings:
-
Multi-fidelity behaviors: Acting with variable state information,
Elly Winner and Manuela Veloso.
In Proceedings of AAAI-2000, August 2000.
-
Oracular Partially Observable Markov Decision Processes: A Very Special Case,
Nicholas Armstrong-Crews and Manuela Veloso.
In Proceedings of ICRA'2007, Rome, Italy, April 2007.
-
An Approximate Algorithm for Solving Oracular POMDPs,
Nicholas Armstrong-Crews and Manuela Veloso.
In Proceedings of ICRA'2008, Pasadena, CA, 2008.
-
Automated robot behavior recognition applied to robotic soccer,
Kwun Han and Manuela Veloso.
In John Hollerbach and Dan Koditschek, editors, Robotics Research: the
Ninth International Symposium, pages 199--204. Springer-Verlag, London,
2000.
- Wednesday, December 3: Wrap Up
- Lecture Notes
- Points to remember from the discussion in class:
- Readings: