Next: The Convergence of a Up: -MDPs: Learning in Varying Previous: Acknowledgments

Convergence Theorems in Generalized MDPs

In this appendix we cite an important convergence theorem of [Szepesvári and Littman(1996)]. The main contribution of this theorem is that it traces back the convergence of the asynchronous value iteration process to the convergence of the approximation of a synchronous dynamic-programming operator, which is in general much easier to prove.

Subsections

The Convergence of a General Value Iteration Process
The Convergence of the Generalized Q-learning Algorithm