next up previous
Next: Sampling from Near-identical Distributions Up: Convergence Theorems in Generalized Previous: The Convergence of a

The Convergence of the Generalized Q-learning Algorithm

It is an easy consequence of Theorem A.1 that under appropriate conditions, the Q-learning algorithm is convergent.

Theorem A.3   If Assumption A holds, and the learning rates satisfy $ \sum_{t=0}^{\infty} \chi(x_t\!=\!x, a_t\!=\!a) \alpha_t(x,a) =
\infty$ and $ \sum_{t=0}^{\infty} \chi(x_t\!=\!x, a_t\!=\!a)
\alpha_t(x,a)^2 < \infty$ uniformly w.p.1, then the generalized Q-learning algorithm described by iteration (Equation 3) converges to $ Q^*$ w.p.1 uniformly over $ X \times A$.

Proof. The theorem is an easy consequence of Theorem A.1 with the appropriate substitutions and assignments:

Consider the substitution $ X \leftarrow
X \times A$, $ V^*
\leftarrow Q^*$ and $ V_t \leftarrow Q_t$. Furthermore, let the randomized approximate dynamic programming operator sequence defined by

$\displaystyle T_t(Q',Q)(x,a) = \begin{cases}(1-\alpha_t(x,a))Q'(x,a) + \alpha_t...
...), & \text{if $x=x_t$\ and $a=a_t$}, \\ Q'(x,a) & \text{otherwise}, \end{cases}$    

which approximates $ K$ at $ Q^*$ w.p.1 over $ X \times A$ under the assumptions (2)-(4) by the well-known Robbins-Monro theorem [Robbins and Monro(1951)].

Finally, let

$\displaystyle G_t(x,a)= \begin{cases}1-\alpha_t(x,a), & \text{if $x=x_t$\ and $a=a_t$}, \\ 1, & \text{otherwise} \end{cases}$ (11)

and

$\displaystyle F_t(x,a)= \begin{cases}\gamma\alpha_t(x,a), & \text{if $x=x_t$\ and $a=a_t$}, \\ 0, & \text{otherwise.} \end{cases}$ (12)

Condition (4) of Theorem A.1 trivially holds, while conditions (1) and (2) are implied by the definition of $ T_t$ and the non-expansion property of $ {\textstyle\bigotimes}$. Finally, condition (3) is a consequence of assumption (3) and the definition of $ G_t$.

Applying Theorem A.1 proves the statement. $ \qedsymbol$


next up previous
Next: Sampling from Near-identical Distributions Up: Convergence Theorems in Generalized Previous: The Convergence of a