next up previous
Next: Event-learning with a background Up: -MDPs: Learning in Varying Previous: Sampling from Near-identical Distributions

Stochastic Processes with Non-diminishing Perturbations

In this appendix we prove the lemma required by the proof of Theorem 3.3:

Lemma C.1   Let $ X$ be an arbitrary set, $ 0 \le \gamma < 1$, $ h>0$ and consider the sequence

$\displaystyle w_{t+1}(x) = G_t(x)w_t(x) + F_t(x)(\Vert w_t\Vert+h),$    

where $ x\in X$, there exists a $ 0 < C < \infty$ bound such that $ \left\Vert w_1\right\Vert \le C$ with probability one, $ 0 \le G_t(x) \le 1$ and $ 0 \le F_t(x) \le 1$ for all $ t$. Assume that for all $ k$, $ \lim_{n\rightarrow \infty} \prod_{t=k}^{n}G_t(x)=0$ uniformly in $ x$ w.p.1 and $ F_t(x) \le \gamma (1-G_t(x))$ w.p.1. Then $ \limsup_{t\to\infty} \left\Vert w_t\right\Vert \le H_0 =
\frac{1+\gamma}{1-\gamma} h$ w.p.1.

Proof. We will prove that for each $ \varepsilon , \delta$ there exists an index $ T <\infty$ such that

$\displaystyle \textrm{Pr} (\sup_{t\ge T} \left\Vert w_t\right\Vert <H_0+\delta) > 1-\varepsilon .$ (15)

Fix $ \varepsilon , \delta>0$ arbitrarily. Furthermore fix a sequence of $ 0<p_n<1$ numbers ( $ n=1,2,\ldots$) to be chosen later.

Let $ H_1 := \frac{\gamma}{1-\gamma}h$ and $ C =
\max(\left\Vert w_t\right\Vert ,H_1)$. Then

$\displaystyle w_{t+1}(x)$ $\displaystyle \le$ $\displaystyle G_t(x)\left\Vert w_t\right\Vert + F_t(x)(\left\Vert w_t\right\Vert +h)$  
  $\displaystyle \le$ $\displaystyle G_t(x) C + F_t(x)(C+h)$  
  $\displaystyle \le$ $\displaystyle G_t(x) C + \gamma (1-G_t(x))(C+h)$  
  $\displaystyle =$ $\displaystyle C(-\gamma G_t(x) + \gamma + G_t(x) - 1) + C + \gamma h -\gamma G_t(x) h$  
  $\displaystyle \le$ $\displaystyle -\frac{\gamma h}{1-\gamma}(1 - G_t(x))(1-\gamma) + C + \gamma h -\gamma G_t(x) h$  
  $\displaystyle =$ $\displaystyle C - \gamma h + \gamma h G_t(x) + \gamma h - \gamma G_t(x) h = C.$  

Thus, we have that $ \left\Vert w_{t+1}\right\Vert \le \max(\left\Vert w_t\right\Vert , H_1)$. Now define $ \bar{\gamma} = \frac{1+\gamma}{2}$. Since $ \bar\gamma >
\gamma$, $ H_0 = \frac{\bar\gamma}{1-\bar\gamma}h >
\frac{\gamma}{1-\gamma}h = H_1$. Consequently, if $ \left\Vert w_1\right\Vert \le
H_0$, then $ \left\Vert w_t\right\Vert \le H_0$ holds for all $ t$, as well. From now on, we will assume that $ \left\Vert w_1\right\Vert > H_0$.

Let $ \left\Vert w_1\right\Vert = C_1 > H_0$. Since $ \left\Vert w_t\right\Vert \le C_1$, the process

$\displaystyle y_{t+1} = G_t(x) y_t(x) + \gamma (1-G_t(x))(C_1+h)
$

with $ y_1 = w_1$ estimates the process $ w_t$ from above: $ 0\le w_t
\le y_t$ holds for all $ t$. The process $ y_t$ converges to $ \gamma(C_1+h)$ w.p.1 uniformly over $ X$, so

$\displaystyle \limsup_{t\rightarrow\infty} \left\Vert w_t\right\Vert \le \limsup_{t\rightarrow\infty} \left\Vert y_t\right\Vert \le \gamma(C_1 + h)
$

w.p.1. Since $ \bar\gamma >
\gamma$, there exists an index $ M_1$, for which if $ t>M_1$ then $ \left\Vert w_t\right\Vert \le \bar\gamma (C_1+h)$ with probability $ p_1$. The proof goes on by induction: assume that up to some index $ i\ge 1$ we have found indices $ M_1, \ldots, M_i$ such that when $ t>M_i$ then

$\displaystyle \left\Vert w_t\right\Vert \le {\bar\gamma}^i C_1 + h\left( \sum_{k=1}^{i} {\bar\gamma}^k \right) = C_{i+1}$ (16)

holds with probability $ p_1p_2\ldots p_i$. Now let us restrict ourselves to those events for which inequality (16) holds. Then we see that the process
$\displaystyle y_{M_i}(x)$ $\displaystyle =$ $\displaystyle w_{M_i}(x),$  
$\displaystyle y_{t+1}(x)$ $\displaystyle =$ $\displaystyle G_t(x) y_t(x) + \gamma (1-G_t(x))(C_{i+1}+h), \quad
t\ge M_i$  

bounds $ w_t$ from above from the index $ M_i$. The process $ y_t$ converges to $ \gamma(C_{i+1}+h) = {\bar\gamma}^{i+1} C_1 + h\left(
\sum_{k=1}^{i+1} {\bar\gamma}^k \right) = C_{i+2}$ w.p.1 uniformly over $ X$, so the above argument can be repeated to obtain an index $ M_{i+1}$ such that (16) holds for $ i+1$ with probability $ p_1p_2\ldots p_{i+1}$.

Since $ \bar\gamma < 1$, $ {\bar\gamma}^i C_1 \rightarrow 0$ and $ h
\left( \sum_{k=1}^{i} {\bar\gamma}^k \right) \rightarrow
\frac{\bar\gamma}{1-\bar\gamma}h= H_0$. So there exists an index $ k$ for which $ C_{k}< H_0+\delta$. Then inequality (15) can be satisfied by setting $ p_1,\ldots,p_k$ so that $ p_1p_2\ldots
p_k \ge 1-\varepsilon $ holds and letting $ T = M_k$. $ \qedsymbol$


next up previous
Next: Event-learning with a background Up: -MDPs: Learning in Varying Previous: Sampling from Near-identical Distributions