|
The repository contains useful downloadable material related to my research and teaching, including Matlab software, presentations, and demonstration movies. If an item has a "»" button to its right, this button can be clicked to reveal more information; the "«" button then hides this information again (requires Javascript).
Software
-
MARL toolbox ver. 1.3,
a Matlab multi-agent reinforcement learning toolbox (4 August 2010, 336.9 KBytes).
»
The Multi-Agent Reinforcement Learning toolbox is a package of Matlab functions and scripts that I used in my research on multi-agent learning. We prefer Matlab for its ease of use with numeric computations and its rapid prototyping facilities. Since no Matlab toolbox for dynamic multi-agent tasks was available when I started my PhD project, I started writing one of my own. This is the result. The toolbox is developed with modularity in mind, separating for instance the agent behaviour from the world engine and the latter from the rendering GUI. Currently the toolbox supports only episodic environments, but hooks are in place for continuing tasks as well. The learning, action selection and exploration methods can be independently plugged into the agents' behaviour.
Several types of gridworld-based environments are implemented, and agents can learn using a set of algorithms among which single-agent Q-learning, team Q-learning, minimax-Q, WoLF-PHC and an adaptive state expansion algorithm developed by us. Everything is written for the generic n-agent case, except minimax-Q, which is most meaningful in the two-agent case.
The latest version, 1.3, adds the Distributed Q-learning algorithm and the new 'robotic rescue' gridworld environment used in the example of our survey chapter Multi-Agent Reinforcement Learning: An Overview (where the problem was described more generically as 'object transportation'). Also included is a demonstration script illustrating the experiments reported in the chapter.
«
-
MARL toolbox documentation,
the documentation files for the MARL toolbox (4 August 2010, 223.1 KBytes).
»
This archive accompanies the Multi-Agent Reinforcement Learning toolbox, and documents its features and usage. An up-to-date HTML reference of the functions and scripts in the toolbox is included, but the documentation itself has unfortunately not been updated since version 1.0 of the toolbox.
«
-
Approximate RL and DP toolbox,
developed in Matlab. (6 June 2010, 967.6 KBytes).
»
This toolbox contains Matlab implementations of a number of approximate reinforcement learning (RL) and dynamic programming (DP) algorithms, notably including the algorithms used in our book Reinforcement Learning and Dynamic Programming Using Function Approximators.
The toolbox features:
- Algorithms for approximate value iteration: grid Q-iteration, fuzzy Q-iteration, and fitted Q-iteration with nonparametric and neural network approximators. In addition, an implementation of fuzzy Q-iteration with cross-entropy (CE) optimization of the membership functions is provided.
- Algorithms for approximate policy iteration: least-squares policy iteration (LSPI), policy iteration with LSPE-Q policy evaluation, online LSPI and online policy iteration with LSPE-Q, as well as online LSPI with explicitly parameterized and monotonic policies. These algorithms all support generic approximators, of which a variety are already implemented.
- Algorithms for approximate policy search: policy search with adaptive basis functions, using the CE or DIRECT methods for global optimization. An additional generic policy search algorithm, with a configurable optimization technique and generic policy approximators, is provided.
- Implementations of several well-known reinforcement learning benchmarks (the car-on-the-hill, bicycle balancing, inverted pendulum swingup), as well as more specialized control-oriented tasks (DC motor, robotic arm control) and a highly challenging HIV infection control task.
- A set of thoroughly commented demonstrations illustrating how all these algorithms can be used.
- A standardized task interface means that users will be able to implement their own tasks. The algorithms functions also follow standardized input-output conventions, and use a highly flexible, standardized configuration mechanism.
- Optimized Q-iteration and policy iteration implementations, taking advantage of Matlab built-in vectorized and matrix operations (many of them exploiting LAPACK and BLAS libraries) to run extremely fast.
- Extensive result inspection facilities (plotting of policies and value functions, execution and solution performance statistics, etc.).
- Implementations of several classical RL and DP algorithms for discrete problems: Q-learning and SARSA with or without eligibility traces, Q-iteration, and policy iteration with Q-functions.
For more details, see the the readme file of the toolbox. Note you will need two additional functions to make the toolbox work! The readme file describes how these functions can be obtained.
Since June 6th 2010, the archive also includes the regression trees package of Pierre Geurts, redistributed with his kind permission.
«
-
makepdf,
a Windows XP batch script to automate the creation of PDF files from DVI (21 November 2008, 2.4 KBytes).
Presentations
-
Reinforcement learning with function approximation,
my talk in the Optimal Adaptive Control workshop at the IEEE Conference on Decision and Control (11 December 2011, 5.5 MBytes).
»
Artificial-intelligence techniques for reinforcement learning are introduced starting from the discrete-time, discrete-valued roots of the field. After motivating and formalizing the problem, several essential classes of basic algorithms will be described: value iteration, policy iteration, and policy search. Using function approximation, these algorithms will be extended to work in continuous-variable systems. Algorithm development is complemented by a study of theoretical questions regarding convergence and solution quality, and by illustrative examples and case studies.
«
-
Optimistic planning for near-optimal control in MDPs,
an in-depth description of our optimistic planning algorithm and its analysis (1 December 2011, 1.1 MBytes).
»
Markov decision processes (MDPs) describe general problems in which actions must be applied to a system so as to maximize a long-term cumulative reward. Such problems arise in many fields, including automatic control, artificial intelligence, medicine, economics etc. Recently, the community has intensified its interest in online planning methods for solving MDPs, due to their relative independence on the state dimensionality. At every interaction step, these methods select an action based on a local exploration of possible control policies from the current state (so they are also a type of model-predictive control).
In this presentation, we consider a planning algorithm that optimistically explores the space of closed-loop policies, always refining the most promising solution found so far. This is similar to how classical planning works, so the algorithm can be seen e.g. as an extension of classical AO* to infinite-horizon MDPs. We analyze the quality of the action choices made by optimistic planning, for problems with a finite number of actions and possible next states for each transition. Performance does not directly depend on these numbers, instead the algorithm implicitly adapts to the (unknown) problem complexity. In particular, specializing the result for some interesting classes of MDPs illustrates the algorithm works better when there are fewer near-optimal policies and less uniform transition probabilities. The presentation closes with some promising experimental results, including the online control of a simulated HIV infection.
«
-
Reinforcement learning lectures,
introducing classical and approximate RL (3 March 2010, 2.1 MBytes).
»
This is a two-part lecture on reinforcement learning (RL) for discrete and continuous-variable tasks.
In the first part, the Markov decision process formalism is introduced and the optimal RL solution is characterized. This is followed by a discussion of classical, discrete online RL algorithms. Eligibility traces and experience replay are introduced.
The second part briefly returns to the classical dynamic programming (DP) algorithms for value and policy iteration, and then extends them to the approximate, continuous-variable case. Throughout the two lectures, simulation and real-time control examples accompany the theoretical developments and algorithm descriptions.
This presentation employs the demonstration movies below, and refers to the RL demos under "Software" above.
«
-
Reinforcement learning in continuous state and action spaces,
my defense presentation, with a very gentle introduction to the topic. (13 January 2009, 391.3 KBytes).
»
This presentation introduces the basics of RL and dynamic programming (DP), and the need for
approximation in continuous spaces. Very little prior knowledge is required (basic math should be enough), and nearly every concept is illustrated graphically. So, this presentation may be useful to persons unfamiliar with the RL and DP field.
«
-
Model-based reinforcement learning with fuzzy approximation,
an overview of our fuzzy Q-iteration algorithm, with convergence and consistency results. (9 April 2008, 930.5 KBytes).
»
Reinforcement learning is a widely used paradigm for learning control.
Computing exact reinforcement learning solutions is generally only possible
when process states and control actions take values in a small discrete set.
In practice, approximate algorithms are necessary. This presentation first
introduces the RL problem, and then describes an approximate, model-based
reinforcement learning algorithm. This algorithm relies on a fuzzy partition
of the state space, and on a discretization of the action space. It converges
to a solution that lies within a bound of the optimal solution. Under
continuity assumptions on the dynamics and the reward function, the algorithm
is also consistent, which means that the optimal solution is asymptotically
obtained as the approximation accuracy increases.
The algorithm is applied to an example control problem, where a good performance is obtained.
The influence of discontinuous reward functions, which do not satisfy the conditions for consistency,
is studied. It appears that a continuous reward function is important for a predictable improvement
in performance as the approximation accuracy increases. Finally, the algorithm is used to swing up
an underactuated inverted pendulum.
«
-
Reinforcement learning for multi-agent systems,
a good overview talk to which I collaborated; this was presented by Prof. Robert Babuska at the CABS colloquium (the link opens a separate download page) (22 June 2006).
Demonstration Movies
-
Learning to swing up an inverted pendulum,
using online least-squares policy iteration. (8 January 2009, 51.8 MBytes).
»
The inverted pendulum is obtained by placing a weight off-center on a disk driven by a DC motor. The motor is underactuated, so it cannot push the weight up in one go, but must swing back and forth. Half of the learning trials are started with the weight pointing down, and half in a random initial state obtained by applying a sequence of random actions (that is the reason for the large random actions applied even after the controller has learned to properly swing up the pendulum).
«
-
Final swingup solution,
after the online LSPI learning experiment was completed. (8 January 2009, 864.9 KBytes).
-
Robot goalkeeper learning to catch the ball,
using approximate online RL and experience replay (demo by Sander Adam). (1 October 2008, 13.3 MBytes).
|