AI Seminar ------------------------------- Tuesday, March 22nd, 2005 4:00 pm - 5:30 pm 175 ATL (Large Conference Room) "Stochastic Policy Optimization with Discrete Combinatorial Constraints" Dmitri Dolgov Artificial Intelligence Laboratory University of Michigan =============================== Markov decision processes provide a simple and elegant framework for solving stochastic policy optimization problems. However, there are domains where the classical MDP model proves inadequate, because its scalar reward function might not be expressive enough to describe all the feedback from the environment, and because the agent might have various resource and architectural constraints that affect what policies it can implement. In this talk, I will discuss a class of MDPs with discrete combinatorial constraints and will describe a family of algorithms, based on mixed integer programming, for solving these NP-complete policy optimization problems. In particular, I will consider the problem of finding optimal stationary deterministic policies for MDPs with multiple rewards, costs, and discount factors (for which no implementable solution algorithms, aside from general non-convex optimization techniques, have been known heretofore), as well as a special case of the MDP with a single reward, single discount factor, and multiple cost constraints (for which no algorithms have existed for finding optimal deterministic policies). I will also describe how this family of integer-programming algorithms can be used as a computationally-efficient method for resource allocation among multiple MDP-based agents, where the value of a resource bundle to an agent is defined by the best policy that is implementable, given the resources. This approach can be used in a cooperative setting and can also be fruitfully employed to design a computationally efficient auction-based mechanism for resource allocation among self-interested agents. This is joint work with Ed Durfee.