** Next:** Q-learning in Generalized -MDPs
** Up:** MDPs in Varying Environments
** Previous:** Generalized -MDPs

##

Asymptotic Boundedness of Value Iteration

In this section we prove a generalized form of the convergence
theorem of Szepesvári and Littman's (for the original theorem, see
Appendix A). We do not require probability one
uniform convergence of the approximating operators, but only a
sufficiently close approximation. Therefore the theorem can be
applied to prove results about algorithms in generalized
-MDPs.
Our definition of closeness both for value functions and
dynamic-programming operators is given below.

Let be an arbitrary state space and denote by
the set of *value functions*. Let
be an arbitrary contraction mapping with unique
fixed point , and let
be a sequence of random
operators.

**Definition 3.1**
A series of value functions

-approximates

with

, if

with probability one.

**Definition 3.2**
We say that

-approximates

at

over

, if
for any

and for

,

-approximates

over

with probability one.

Note that may depend on the approximated value function ,
unlike the previous example in Equation 4.
-approximation of value functions is, indeed, weaker (more
general) than probability one uniform convergence: the latter
means that for all
there exists a such
that

whereas an equivalent form of -approximation is that for
all
there exists a such that
and is fixed.

**Theorem 3.3**
Let

be an arbitrary mapping with fixed point

, and let

-approximate

at

over

. Let

be an
arbitrary value function, and define

. If
there exist functions

and

satisfying the conditions below with probability one

- for all
and all ,
- for all
and all ,
- for all ,
converges to zero uniformly in as increases;
^{2} and,
- there exists
such that for all and sufficiently large ,

then

-approximates

over

, where

.

*Proof*.
The proof is similar to that of the original theorem. First we
define the sequence of auxiliary functions

by the recursion

,

. Since

-approximates

at

,

, i.e., for sufficiently large

and

,

. Let

For

we have

Then, by Lemma

C.1 (found in the appendix), we get
that

. From this,

.

** Next:** Q-learning in Generalized -MDPs
** Up:** MDPs in Varying Environments
** Previous:** Generalized -MDPs