Journal of Machine Learning Research (3) 2002 145-174       Submitted 10/01; Revised 1/02; Published 8/02


$ \varepsilon $-MDPs: Learning in Varying Environments

István Szita
Bálint Takács
András Lőrincz
Department of Information Systems, Eötvös Loránd University
Pázmány Péter sétány 1/C
Budapest, Hungary H-1117


Editor: Sridhar Mahadevan


In this paper $ \varepsilon $-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvári and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an $ \varepsilon $-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem.

reinforcement learning, convergence, event-learning, SARSA, MDP,
generalized MDP, $\varepsilon $-MDP, SDS controller