Convergence Theorems in Generalized MDPs

In this appendix we cite an important convergence theorem of [Szepesvári and Littman(1996)]. The main contribution of this theorem is that it traces back the convergence of the asynchronous value iteration process to the convergence of the approximation of a synchronous dynamic-programming operator, which is in general much easier to prove.

- The Convergence of a General Value Iteration Process
- The Convergence of the Generalized Q-learning Algorithm