next up previous
Next: Markov Decision Processes with Up: -MDPs: Learning in Varying Previous: Introduction


To begin with, we recall the definition of a Markov Decision Process (MDP) [Puterman(1994)]. A (finite) MDP is defined by the tuple $ \langle X, A, R, P \rangle$, where $ X$ and $ A$ denotes the finite set of states and actions, respectively. $ P: X
\times A \times X \rightarrow [0,1]$ is called the transition function, since $ P(x,a,y)$ gives the probability of arriving at state $ y$ after executing action $ a$ in state $ x$. Finally, $ R: X
\times A \times X \rightarrow \mathbb{R}$ is the reward function, $ R(x,a,y)$ gives the immediate reward for the transition $ (x,a,y)$.