next up previous
Up: -MDPs: Learning in Varying Previous: Event-learning with a background


T. M. Aamodt.
Intelligent control via reinforcement learning.
Basc thesis, University of Toronto, 1997.
URL$ \sim$aamodt/.

A. Barto.
Discrete and continuous models.
International Journal of General Systems, 4:163-177, 1978.

R. Bellman.
Dynamic Programming.
Princeton University Press, Princeton, New Jersey, 1957.

J. A. Boyan.
Modular neural networks for learning context-dependent game strategies.
Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, UK, August 1992.

Dayan and Hinton(1993)
P. Dayan and G. E. Hinton.
Feudal reinforcement learning.
In Advances in Neural Information Processing Systems, volume 5, pages 271-278, San Mateo, CA, 1993. Morgan Kaufmann.

T.G. Dietterich.
Hierarchical reinforcement learning with the MAXQ value function decomposition.
Journal of Artificial Intelligence Research, 13:227-303, 2000.

K. Doya.
Temporal difference learning in continuous time and space.
In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, Cambridge, MA, 1996. MIT Press.

K. Doya.
Reinforcement learning in continuous time and space.
Neural Computation, 12:243-269, 2000.

Fomin et al.(1997)
T. Fomin, T. Rozgonyi, Cs. Szepesvári, and A. Lorincz.
Self-organizing multi-resolution grid for motion planning and control.
International Journal of Neural Systems, 7:757-776, 1997.

Givan et al.(2000)
R. Givan, S. M. Leach, and T. Dean.
Bounded-parameter markov decision processes.
Artificial Intelligence, 122(1-2):71-109, 2000.

Gullapalli and Barto(1994)
V. Gullapalli and A. G. Barto.
Convergence of indirect adaptive asynchronous value iteration algorithms.
In J. D. Cowan, G. Tesauro, and J. Alspector, editor, Advances in Neural Information Processing Systems, volume 6, pages 695-702, San Mateo, CA, 1994. Morgan Kaufmann.

M. Heger.
Consideration of risk in reinforcement learning.
In Proceedings of the Eleventh International Conference on Machine Learning, pages 105-111, San Fransisco, CA, 1994. Morgan Kaufmann.

Hwang and Ahuja(1992)
Y. K. Hwang and N. Ahuja.
Gross motion planning - a survey.
ACM Computing Surveys, 24(3):219-291, 1992.

Jaakkola et al.(1994)
T. Jaakkola, M. I. Jordan, and S. P. Singh.
On the convergence of stochastic iterative dynamic programming algorithms.
Neural Computation, 6(6):1185-1201, November 1994.

G. H. John.
When the best move isn't optimal: Q-learning with exploration.
In Proceedings of the Twelfth National Conference on Artificial Intelligence, page 1464, Seattle, WA, 1994.

L.P. Kaelbling.
Hierarchical learning in stochastic domains: Preliminary results.
In Proceedings of the Tenth International Conference on Machine Learning, pages 167-173, San Mateo, CA, 1993. Morgan Kaufmann.

Kalmár et al.(1998)
Z. Kalmár, Cs. Szepesvári, and A. Lorincz.
Module-based reinforcement learning: Experiments with a real robot.
Machine Learning, 31:55-85, 1998.

Lorincz et al.(2002)
A. Lorincz, I. Pólik, and I. Szita.
Event-learning and robust policy heuristics.
Cognitive Systems Research, 2002.

M. L. Littman.
Markov games as a framework for multi-agent reinforcement learning.
In Proceedings of the Eleventh International Conference on Machine Learning, pages 157-163, San Fransisco, CA, 1994. Morgan Kaufmann.

P. Maes.
Learning behavior networks from experience.
In F. J. Varela and P. Bourgine, editors, Toward a practice of autonomous systems: Proceedings of the First European Conf. on Artificial Life, Cambridge, MA, 1992. MIT Press, Cambridge.

Mahadevan and Connell(1992)
S. Mahadevan and J. Connell.
Automatic programming of behavior-based robots using reinforcement learning.
Artificial Intelligence, 55:311-365, 1992.

M.J. Mataric.
Behavior-based control: Examples from navigation, learning, and group behavior.
J. of Experimental and Theoretical Artificial Intelligence, 9:2-3, 1997.

Precup and Sutton(1998)
D. Precup and R. Sutton.
Multi-time models for temporally abstract planning.
Advances in Neural Information Processing Systems, 10:1050-1056, 1998.

M. Puterman.
Markov decision processes : Discrete stochastic dynamic programming.
John Wiley & Sons, New York, 1994.

Robbins and Monro(1951)
H. Robbins and S. Monro.
A stochastic approximation method.
Annals of Mathematical Statistics, 22:400-407, 1951.

S. P. Singh.
Scaling reinforcement learning algorithms by learning variable temporal resolution models.
In Proceedings of the Ninth International Conference on Machine Learning, MLC-92, San Mateo, CA, 1992. Morgan Kaufmann.

Sutton and Barto(1998)
R. Sutton and A. G. Barto.
Reinforcement Learning: An Introduction.
MIT Press, Cambridge, 1998.

Sutton et al.(1998)
R. Sutton, D. Precup, and S. Singh.
Between MDPs and semi-MDPs: Learning, planning and representing knowledge at multiple temporal scales.
Journal of Artificial Intelligence Research, 1:1-39, 1998.

Cs. Szepesvári.
Static and dynamic aspects of optimal sequential decision making.
Ph.d. thesis, Attila József University, Bolyai Institute of Mathematics, 1998.

Szepesvári et al.(1997)
Cs. Szepesvári, Sz. Cimmer, and A. Lorincz.
Neurocontroller using dynamic state feedback for compensatory control.
Neural Networks, 10 (9):1691-1708, 1997.

Szepesvári et al.(1997)
Cs. Szepesvári, Sz. Cimmer, and A. Lorincz.
Dynamic state feedback neurocontroller for compensatory control.
Neural Networks, 10:1691-1708, 1997.

Szepesvári and Littman(1996)
Cs. Szepesvári and M. L. Littman.
Generalized Markov decision processes: Dynamic-programming and reinforcement-learning algorithms.
In Proceedings of International Conference of Machine Learning '96, Bari, 1996.

Szepesvári and Lorincz(1997)
Cs. Szepesvári and A. Lorincz.
Approximate inverse-dynamics based robust control using static and dynamic feedback.
In J. Kalkkuhl, K. J. Hunt, R. Zbikowski, and A. Dzielinski, editors, Applications of Neural Adaptive Control Theory, volume 2, pages 151-179. World Scientific, Singapore, 1997.

Szepesvári and Lorincz(1998)
Cs. Szepesvári and A. Lorincz.
An integrated architecture for motion-control and path-planning.
Journal of Robotic Systems, 15:1-15, 1998.

Szita et al.(2002)
I. Szita, B. Takács, and A. Lorincz.
Event-learning with a non-markovian controller.
In F. van Harmelen, editor, 15th European Conference on Artifical Intelligence, Lyon, pages 365-369. IOS Press, Amsterdam, 2002.

ten Hagen(2001)
S. H. G. ten Hagen.
Continuous state space Q-learning for control of non-linear systems.
Phd thesis, University of Amsterdam, Amsterdam, 2001.

J. N. Tsitsiklis.
Asynchronous stochastic approximation and Q-learning.
Machime Learning, 3(16):185-202, September 1994.

C. J. C. H. Watkins.
Learning from Delayed Rewards.
Ph.d. thesis, King's College, Cambridge, UK, 1989.

Watkins and Dayan(1992)
C. J. C. H. Watkins and P. Dayan.
Machine Learning, 8 (3):279-292, 1992.

Yamakita et al.(1995)
M. Yamakita, M. Iwashiro, Y. Sugahara, and K. Furuta.
Robust swing-up control of double pendulum, 1995.