AI Seminar ------------------------------- Tuesday, February 17th, 2004 4:00 pm - 5:30 pm 175 ATL (Large Conference Room) "Distributed Reinforcement Learning in a Network" Benjamin Van Roy Management Science and Engineering, Electrical Engineering, Computer Science Stanford University ---------------------------------- I will discuss a distributed reinforcement learning protocol for optimizing the dynamic behavior of a network of simple electronic components, such as a sensor network, an ad hoc network of mobile devices, or a network of communication switches. This protocol requires only local communication and simple computations which are distributed among devices. As a motivating example, I will discuss a problem involving optimization of power consumption, delay, and buffer overflow in a simplified model of a sensor network. This approach builds on policy gradient methods for reinforcement learning. The protocol can be viewed as an extension of policy gradient methods to a context involving a team of agents optimizing aggregate performance through asynchronous distributed communication and computation. The dynamics of the protocol approximate the solution to an ordinary differential equation that follows the gradient of the performance objective. One shortcoming of customary policy gradient approaches -- for centralized as well as distributed reinforcement learning -- is that signal-to-noise ratios of gradient estimates diminish as the network grows. I will discuss variations of the distributed reinforcement learning protocol that may alleviate this shortcoming.