|
This page lists the research and thesis projects I am currently involved with, as well as summaries of finalized projects.
Please contact me if you need additional information about any of these projects.
Research grants
-
Start: March 2012
End: winter 2012
Participants:
Lucian Busoniu,
principal investigator.
Description:
Existing approaches to consensus problems are largely limited to simple, linear agents, and are each designed for a specific consensus objective. This project proposes to overcome these limitations by using recent optimistic planning and learning methods from the artificial intelligence field to control the agents. These methods are able to address very general optimal control problems. By exploiting this power in consensus, the research initiated in this project will allow tackling nonlinear agents, as well as accommodating – by changes in the cost function – a number of different consensus objectives. This research project is being funded under the initiative Actions Incitatives 2012 of the Research Center for Automatic Control of Nancy.
Currently, Lex Daniels from TUDelft is making a two-month visit at CRAN in the context of this project.
Ongoing thesis projects
-
Start: August 2011
End: August 2012 (expected)
Participants:
Lex Daniels,
MSc student;
Lucian Busoniu,
advisor;
Prof. Robert Babuska,
advisor.
Description: This project will consider the class of optimistic planning algorithms for online control, which at every step explore promising sequences of actions so that a first near-optimal action is found after exhausting a given computational budget. In their basic variants, these algorithms discard the planning data right after using it to choose the current action, and then have to start over from scratch at the next step, thereby wasting computation. In this project, sound and efficient ways of reusing knowledge in optimistic planning will be designed. Thus, rather than just plan, the resulting methods will also learn. The data can be reused either in an exhaustive, low-level form, or in the condensed form of a function approximator, which synthesizes the knowledge obtained so far.
-
Start: December 2009
End: December 2013 (expected)
Participants:
MSc Ivo Grondman,
PhD candidate;
Lucian Busoniu,
co-advisor;
Dr. Gabriel Lopes,
co-advisor;
Prof. Robert Babuska,
promotor.
Description: In order to fully realize the potential of RL, high-dimensional problems must be addressed. Unfortunately, many current algorithms are limited to low-dimensional problems and learn slowly. Therefore, this project will develop a class of RL methods that learn effectively in high-dimensional problems (tens of dimensions or more). These methods will be validated in simulation and real-time robot control examples.
Finalized thesis projects
-
Start: February 2010
End: March 2011
Participants:
Jurriaan Knobel,
MSc student;
Dr. Gabriel Lopes,
co-advisor;
Lucian Busoniu,
co-advisor;
Prof. Robert Babuska,
advisor.
Description: The goal of this project is to optimize the gait of a quadruped robot, dealing with different types of terrain, using optimization-based policy search methods, or more traditional value-function based reinforcement learning techniques.
-
Start: January 2010
End: August 2010
Participants:
Sjoerd Boersma,
BEng student;
Lucian Busoniu,
co-advisor;
Prof. Robert Babuska,
advisor;
Prof. Albert Moes,
advisor.
Description:
Robotic soccer is a popular benchmark problem for intelligent control methods, and for reinforcement learning in particular. This project develops a Matlab-based framework for the vision-based, reinforcement learning control of a robotic soccer goalkeeper (see figure). The finalized robotic goalkeeper will serve as an interactive demo for reinforcement learning algorithms. Additionally, the Matlab package is general enough to also be reusable for other robotic applications.
-
Start: June 2009
End: May 2010
Participants:
Sholeh Norouzzadeh,
MSc student;
Lucian Busoniu,
co-advisor;
Prof. Robert Babuska,
advisor.
Description: All humans develop in an autonomous open-ended manner through life-long learning. As humans, we simplify a task that is difficult to learn by first learning simplified versions of it, before moving back to the original, more difficult version. This is in contrast to most reinforcement learning methods, where the controller (agent) is often designed to learn a challenging task from scratch. In this project, this idea of gradually increasing the complexity of the task, called shaping, is exploited to accelerate online reinforcement learning. We review and compare in simulations the various approaches to shaping found in the literature. We also develop a new way to determine a suitable amount of trainnig in the easy task(s), in order to reduce the total learning time.
-
Start: September 2010
End: January 2011
Participants:
Jessica Vleugel, Michelle Hoogwout, Koen Hermans, and Imre Gelens,
BSc students;
MSc Ivo Grondman,
co-advisor;
Lucian Busoniu,
co-advisor;
Prof. Robert Babuska,
advisor.

Description:
A core problem in robotics is avoiding unsafe regions of the state space, where the robot may damage itself, or worse, injure its users. Classically, this issue is solved in RL by giving a large penalty (negative reward) upon reaching the unsafe region, and then terminating the learning trial and resetting to a new, safe initial state. However, with this method the unsafe region can still be reached many times, until the algorithm has learned to avoid it. This is unacceptable in real-life, safety-critical applications, in which the unsafe region should never be reached.
This project therefore develops two methods to avoid unsafe regions in RL. The first method uses a process model to predict the next state, and discards any actions that would reach unsafe states. The second method uses a safety controller that, whenever the state gets close to the unsafe region, overrules the learning controller and takes the system back into the safe region. In a classical gridworld problem, the two algorithms are empirically shown to always avoid unsafe regions without great performance losses.
A minipaper summarizing the project and its results can be downloaded: Reinforcement learning with avoidance of unsafe regions.
-
Start: May 2008
End: June 2009
Participants:
Sjoerd Huiberts,
MSc student;
Lucian Busoniu,
co-advisor;
Prof. Robert Babuska,
advisor.
Description: This project focuses on online non-parametric approximation in reinforcement learning. A non-parametric approximator does not take a predetermined form but is constructed from the data. The thesis proposes online SARSA and Q-learning with non-parametric approximation. The performance of the algorithms is studied by learning to control a inverted pendulum swingup. Three non-parametric RL methods are successfully applied, based on: kernel recursive least squares, least-squares support vector regression, and a novel algorithm called partial support vector regression. We show that RL with non-parametric regression successfully learns the control task. Moreover, fewer kernels are used than basis functions in parametric approximation.
-
Start: January 2005
End: January 2009
Participants:
Lucian Busoniu,
PhD candidate;
Prof. Bart De Schutter,
promotor;
Prof. Robert Babuska,
promotor.
This was my PhD project. For a description of the research done within its scope, please see the abstract of the thesis entry in the Publications section.
-
Start: October 2007
End: October 2008
Participants:
Sander Adam,
MSc student (graduated cum laude);
Lucian Busoniu,
co-advisor;
Prof. Robert Babuska,
advisor.
Description: Although Reinforcement Learning (RL) is guaranteed to give an optimal controller for many control problems, its practical use is limited due to its slow learning performance. This paper introduces a new class of algorithms which dramatically speeds up learning performance, at moderate computational cost. Opposed to traditional RL algorithms which use each data sample only once, the newly introduced algorithms repeatedly present all collected data samples to the learning controller in a process named experience replay (ER). The use of experience replay in RL has only been researched in some very specific applications, and has never been used as the main learning mechanism. Analysis shows that the ER algorithms learn fast, are computationally efficient and scale up well to multidimensional state-spaces. The ER algorithms are tested on a pendulum swing-up task, both in simulation and in reality. In simulation, the ER algorithms outperform a least-squares policy iteration controller in terms of learning speed and computational complexity. The ER algorithms also perform well on the real pendulum swing-up setup, where they successfully learn to swing up within 100 s. The application on a two-link robotic manipulator simulation shows the ability of the ER algorithms to scale up well to larger state-action spaces. Finally, high performance is obtained on a real robotic goalkeeper setup, illustrating the applicability of ER algorithms to practical control problems.
Watch on YouTube:
-
Start: August 2007
End: June 2008
Participants:
Maarten Vaandrager,
MSc student (graduated cum laude);
Lucian Busoniu,
co-advisor;
Prof. Robert Babuska,
advisor.
Description: A RL controller learns an optimal policy by online (real-time) exploration of the control task. Usually, RL algorithms take a long time to converge. This is an important obstacle preventing the application of RL to real-life problems. This project investigates ways to speed up the convergence of RL algorithms by using prior knowledge about the controlled process or about the solution. Several new architectures of actor-critic learning are proposed, making use of locar linear regression as an approximator. Then, prior knowledge about the process and solution is added to these algorithm, in the non-parameteric form of measurement samples. The resulting algorithms give better performance in simulation examples that the original algorithms, which did not use prior knowledge.
Watch movies showing:
-
Start: April 2007
End: August 2007
Participants:
Thijs Ramakers,
BSc student (graduated);
Lucian Busoniu,
co-advisor;
Prof. Robert Babuska,
advisor.
Description: A RL controller learns an optimal policy by online (real-time) exploration of the control task. The main drawback, currently preventing a widespread use of RL, are the large number of iterations needed for convergence and the fact that most available algorithms only provably converge for discrete-valued problems. Within this project, an optimal controller for a positioning servosystem driven by a DC motor was developed. In order to apply reinforcement learning, we discretized the continuous state and action variables of the system. In contrast to the ad-hoc action discretizations typically employed in the RL literature, we first tested several discretizations on a non-learning, pre-designed controller, and used the discretization that performed best in our learning experiments. The learning controller was implemented both in simulation and on the real servo-system.
-
Start: September 2005
End: August 2006
Participants:
Yuan Xu,
MSc student (graduated);
Lucian Busoniu,
co-advisor;
Prof. Robert Babuska,
advisor.
Description: Although RL has been applied in different fields of engineering, operation research and so on, it is well understood for many important problems the computational costs of RL are very high, a result of the so called "curse of dimensionality". Moreover, the slow convergence rate of RL algorithms in a large state space limits their wider application. In this project, a structure combining the self-organizing map (SOM) with RL was proposed, in an attempt to solve these problems. RL works in a state space reduced by the SOM. A single-elevator system was implemented as the benchmark to test the proposed structure. Promising results on this example were obtained for the value-iteration and Q-learning algorithm.
|