L. Busoniu, R. Babuska, B. De Schutter, and D. Ernst,see http://www.dcsc.tudelft.nl/rlbook, as well as a number of other algorihtms.Reinforcement Learning and Dynamic Programming Using Function Approximators, CRC Press,Automation and Control EngineeringSeries. April 2010, 280 pages, ISBN 978-1439821084.

- Algorithms for approximate value iteration: grid Q-iteration (gridqi), fuzzy Q-iteration (fuzzyqi), and fitted Q-iteration with nonparametric and neural network approximators (fittedqi). In addition, an implementation of fuzzy Q-iteration with cross-entropy (CE) optimization of the membership functions is provided (cefuzzyqi).
- Algorithms for approximate policy iteration: least-squares policy iteration (lspi), policy iteration with LSPE-Q policy evaluation (lspe), online LSPI (lspionline) and online policy iteration with LSPE-Q (lspeonline), as well as online LSPI with explicitly parameterized and monotonic policies (lspihonline). These algorithms all support generic Q-function approximators (and, for lspihonline, generic policy approximators), of which a variety are already implemented, see create_approx and the approx subdirectory of the toolbox.
- Algorithms for approximate policy search: policy search with adaptive basis functions, using the CE method (cerbfps), and the DIRECT method for global optimization (optrbfps). An additional generic policy search algorithm, with a configurable optimization technique and generic policy approximators, is given in (optps).
- Implementations of several well-known reinforcement learning benchmarks (the car-on-the-hill, bicycle balancing, inverted pendulum swingup), as well as more specialized control-oriented tasks (DC motor, robotic arm control) and a highly challenging HIV infection control task. See the systems subdirectory of the toolbox.
- A set of thoroughly commented demonstrations illustrating how all these algorithms can be used.
- A standardized task interface means that users will be able to implement their own tasks (see sample_problem, sample_mdp). The algorithms functions also follow standardized input-output conventions, and use a highly flexible, standardized configuration mechanism.
- Optimized Q-iteration and policy iteration implementations, taking advantage of Matlab built-in vectorized and matrix operations (many of them exploiting LAPACK and BLAS libraries) to run extremely fast.
- Extensive result inspection facilities (plotting of policies and value functions, execution and solution performance statistics, etc.).
- Implementations of several
*classical*RL and DP algorithms for discrete problems: Q-learning and SARSA with or without eligibility traces (qlearn, sarsa), Q-iteration (qiter), and policy iteration with Q-functions (piter). A cleaning-robot discrete RL task is implemented, following the standardized problem definition framework, and can be used as an example for implementing additional discrete tasks.

- Unzip the archive into a directory of your choice.
- Before using the toolbox, you will need to obtain two additional functions provided by MathWorks:
- ode4, the 4th order Runge-Kutta method for numerical integration. Freely available for download from http://www.mathworks.com/support/tech-notes/1500/1510.html#fixed. Download the file ode4.m and drop it into the lib subdirectory of the toolbox.
- dsxy2figxy, a function for positioning figure annotations. Available in the example files of the basic Matlab distribution, search for the string "dsxy2figxy" in the Matlab documentation. Save the function under the filename dsxy2figxy.m in the lib subdirectory.

- Start up Matlab, point it to the directory where you unzipped the file, and run startupapproxrl.
- Navigate to the demo subdirectory and open the demos. Five demonstration scripts are provided: qi_demo, illustrating the use of Q-iteration algorithms; pi_demo, for offline and online policy iteration; ps_opt_demo, for policy search and fuzzy Q-iteration with CE-optimized MFs; cleaningrobot_demo, an interactive demo illustrating the use of the classical RL and DP algorithms and their results for the cleaning robot problem; and invertedpendulum_demo, an interactive demo illustrating how several approximation-based algorithms work on the inverted pendulum problem. Start the demo scripts from the Matlab prompt to run all the algorithms in a row, or open them in the editor and run them in cell-mode, algorithm-by-algorithm. The comments in the demos should provide enough information for you to get started with using the toolbox.

The basic toolbox requires Matlab 7.3 (R2006b) or later, with the Statistics toolbox included. Some algorithms require additional specialized software, as follows:

- lspihonline requires the Optimization toolbox of Matlab.
- optps requires the Genetic Algorithms and Direct Search toolbox of Matlab.
- optrbfps requires the TomLab base package (http://tomopt.com/).

*Lucian Busoniu, June 2010*

**Acknowledgments:** Pierre Geurts was extremely kind to supply the code for building (ensembles of) regression trees, and allow the redistribution of his code with the toolbox. This code was developed in close interaction with Robert Babuska, Bart De Schutter, and Damien Ernst. Several functions are taken from/inspired by code written by Robert Babuska.

**Final notes:** This software is provided as-is, without any warranties. So, if you decide to control your nuclear power plant with it, better do your own verifications beforehand :) I have only tested the toolbox in Windows XP, but it should also work in other operating systems, with some possible minor issues due to, e.g., the use of backslashes in paths. The main algorithm and problem files are thoroughly commented, and should not be difficult to understand given some experience with Matlab. However, this toolbox is very much work-in-progress, which has some implications. In particular, you will find TODO items, WARNINGs that some code paths have not been thoroughly tested, and some options and hooks for things that have not yet been implemented. Lower-level functions generally still have descriptive comments, although these may be sparser in some cases.