....1
Unless otherwise noted, denotes the max-norm.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... increases;2
Note that the convergence of an infinite product implies that the terms converge to one.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... (E-learning,3
Capital letter E is used to distinguish E-learning from internet based concepts using prefix lower case letter e'.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

....4
Note that depends on both and . When no ambiguity may arise we will not explicitly show these dependencies.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... tracking5
The term, velocity field tracking', may represent the underlying objective of speed field tracking better.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

...Szepesvari97Neurocontroller,Szepesvari97Approximate.6
Sign-properness imposes conditions on the sign but not on the magnitude of the components of the output of the approximate inverse dynamics.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... satisfied.7
Justification of this assumption requires techniques of ordinary differential equations and is omitted here. See also [Barto(1978)].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

....8
Note that the condition on is a kind of Lipschitz-continuity.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... arm.9
The parameters for SARSA were taken from the work of [Aamodt(1997)] and can be considered near-optimal for the SARSA implementation, which was also taken from the same source.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... on'.10
Note that the optimal value function is not available and the norm was computed versus the last state of the experiment.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... distributions11
abbreviates .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
`