next up previous
Next: The Convergence of the Up: Convergence Theorems in Generalized Previous: Convergence Theorems in Generalized

The Convergence of a General Value Iteration Process

Let $ X$ be an arbitrary state-space and denote by $ \mathbf{B} (X)$ the set of value functions over $ X$ (i.e., the set of bounded $ X
\to \mathbb{R}$ functions), and let $ T: \mathbf{B}(X) \to
\mathbf{B}(X)$ be an arbitrary contraction mapping with (unique) fixed point $ V^*$.

Let $ T_t: \mathbf{B}(X) \times
\mathbf{B}(X) \to \mathbf{B}(X)$ be a sequence of stochastic operators. The second argument of $ T_t$ is intended to modify the first one, in order to get a better approximation of $ T$. Formally, let $ U_0$ be an arbitrary value function and let $ U_{t+1} = T_t (U_t,V)$. $ T_t$ is said to approximate $ T$ at $ V$ with probability one over $ X$, if $ \
lim_{t\to\infty} U_t = TV$ uniformly over $ X$.

Theorem A.1 (Szepesvári and Littman)   Let the sequence of random operators $ T_t$ approximate $ T$ at $ V^*$ with probability one uniformly over $ X$. Let $ V_0$ be an arbitrary value function, and define $ V_{t+1} = T_t(V_t, V_t)$. If there exist functions $ 0 \leq F_t(x) \leq 1$ and $ 0 \leq G_t(x)
\leq 1$ satisfying the conditions below with probability one, then $ V_t$ converges to $ V^*$ with probability one uniformly over $ X$:
  1. for all $ U_1, U_2 \in \mathbf{B}(X)$ and all $ x\in X$,

    $\displaystyle \big\vert T_t(U_1,V^*)(x)-T_t(U_2,V^*)(x) \big\vert \leq G_t(x) \big\vert U_1(x) - U_2(x) \big\vert
$

  2. for all $ U, V \in \mathbf{B}(X)$ and all $ x\in X$,

    $\displaystyle \big\vert T_t(U,V^*)(x)-T_t(U,V)(x) \big\vert \leq F_t(x) \sup_{x'} \big\vert V^*(x') - V(x') \big\vert
$

  3. for all $ k > 0$, $ \prod_{t=k}^n G_t(x)$ converges to zero uniformly in $ x$ as $ n$ increases; and,
  4. there exists $ 0 \leq \gamma < 1$ such that for all $ x\in X$ and large enough $ t$,

    $\displaystyle F_t(x)\leq \gamma(1-G_t(x)). $

The proof can be found in [Szepesvári and Littman(1996)]. We cite here the lemma, which is the base of the proof, since our generalization concerns this lemma.

Lemma A.2   Let $ X$ be an arbitrary set, $ 0 \le \gamma < 1$ and consider the sequence

$\displaystyle w_{t+1}(x) = G_t(x)w_t(x) + F_t(x)(\Vert w_t\Vert+h_t),$    

where $ x\in X$, $ \left\Vert w_1\right\Vert < C < \infty$ with probability one for some $ C>0$, $ 0 \le G_t(x) \le 1$ and $ 0 \le G_t(x) \le 1$ for all $ t$, and $ \limsup_{t\to\infty} h_t \to 0$ w.p.1. Assume that for all $ k$, $ \lim_{n \to \infty} \prod_{t=k}^{n} G_t(x) = 0$ uniformly in $ x$ w.p.1 and $ F_t \le \gamma(1-G_t(x))$ w.p.1. Then $ \left\Vert w_t\right\Vert $ converges to 0 w.p.1 as well.


next up previous
Next: The Convergence of the Up: Convergence Theorems in Generalized Previous: Convergence Theorems in Generalized