**Lecturer: Lucian Busoniu**

Navigation: [Lecture slides|Supporting material|Labs|Contact] [Back to the lecturer's webpage]

This course provides methods for controlling systems that are too complex or insufficiently known to apply classical control design techniques. The focus is placed on learning algorithms for control, in particular reinforcement learning (RL). Special attention is also paid to model-based techniques related to RL, as they can be very useful in controlling complex systems even when a model is known. After introducing the RL problem, the dynamic programming algorithms that sit at the foundation of RL are described. Then, classical, discrete-variable RL algorithms are introduced. In the second part of the course, the dynamical programming and RL algorithms are extended with approximation techniques, in order to make them applicable to continuous-variable control, as well as to large-scale discrete-variable problems. Finally, several online planning techniques are discussed.

This course is part of the Master program ICAF of the Automation Department, UTCluj (1st year 2nd semester). As prerequisites, basic knowledge of analysis and linear algebra is needed, together with notions of discrete-time dynamical systems and probabilities. The lecturer is Lucian Busoniu.

The course and lab sessions take place on Thursdays from 18:00, in room C01, Dorobantilor. A detailed schedule is given next; any changes that may occur will be announced at least 2 weeks in advance. Current changes:

- Week 4 becomes free (due to scheduling conflicts). Lecture 3 moves to week 5, and lecture 4 to week 6 on Wednesday, before lab 2 (same time, 6PM, and same room, C01); in this way all the lab slots are kept unchanged.

Week; day | Session |
---|---|

#1; 25 Feb | Lecture 1 |

#2; 3 Mar | Lecture 2 |

#3; 10 Mar | Lab 1 |

#4; 17 Mar | free |

#5; 24 Mar | Lecture 3 (rescheduled) |

#6; 30 Mar | Lecture 4 (rescheduled) |

#6; 31 Mar | Lab 2 |

#7; 7 Apr | Lecture 5 |

#8; 14 Apr | Lecture 6 |

#9; 21 Apr | Lab 3 |

#10; 28 Apr | Lecture 7 |

#11; 5 May | Lecture 8 |

#12; 12 May | Lab 4 |

#13; 19 May | Discussion session |

#14; 26 May | --- |

Grading:

- 50% lab grades. There are 4 labs, and each student gets one grade for each lab, from 0 to 10 and formed of:
- up to 2 points optional minitest at the start of the lab, from the lecture material relevant to the assignment of that day
- up to 8 points the mandatory solution of the lab assignment, reduced to 4 points if the solution is delivered late

- 50% exam

The slides are made available here in time for each lecture. The slides are required material for the exam. They, as well as the lectures, are in Romanian (with an exception for Part 6, see below):

- Part 1: The reinforcement learning problem (covered in lecture 1).
- Part 2: The optimal solution. Dynamic programming (covered in lectures 2 and 3).
- Part 3: Reinforcement learning (covered in lectures 3 and 4).
- Part 4: Function approximation. Approximate dynamic programming (covered in lectures 5 and 6).
- Part 5: Approximate reinforcement learning (covered in lectures 6 and 7).
- Part 6: Online planning. Perspectives (covered in lectures 7 and 8). Since much of the material is very recent, the planning slides are in English.

Students may optionally consult the following bibliography:

- L. Busoniu, Reinforcement learning and dynamic programming for control, 2012. These are the lecture notes, written in English for an earlier edition of the course. Some updates were done in the meantime.
- R. Sutton, A. Barto,
*Reinforcement Learning: An Introduction*, MIT Press, 1998. - L. Busoniu, R. Babuska, B. De Schutter, D. Ernst,
*Reinforcement Learning and Dynamic Programming Using Function Approximators*, CRC Press, Automation and Control Engineering Series, 2010. - D. Bertsekas,
*Dynamic Programming and Optimal Control*, volume 2, 4th edition, Athena Scientific, 2012.

In the lab classes, a set of assignments must be solved. A solution consists of a brief report in PDF and associated Matlab code, and must be submitted by a specified deadline. For each lab, the full code or a specified part of it should be completed during the lab session itself. In addition, an oral session with mandatory participation will be organized, where the lecturer will discuss the solutions separately with each student group. In this session, detailed questions will be asked to clearly assess whether the assignment solution is original, and the contribution of each student to this solution.

Submitting the solutions to all the assignments, as well as validating these solutions by discussing them in the oral session, is required before being admitted to the exam. Any copied solution is graded 0, and having two or more copied solutions automatically invalidates the entire solution set. More details on the requirements (including individual deadlines) are available in the assignment descriptions, which will appear here shortly before the corresponding lab session.

In addition, 2 points of each lab grade are awarded as a result of a short (5 minutes) test in the beginning of the class, which covers lecture material relevant to that lab.

- Assignment 1: Markov decision processes. Dynamic programming (PDF) and the Matlab code used as basis for the assignment. You can also use these Matlab exercises to get re-familiarized with Matlab if you haven't been using it in a while.
- Assignment 2: Q-learning (PDF). The Matlab code for Assignment 1 is reused, and in addition two m-files are supplied: a template for implementing the Q-learning algorithm in Matlab: qlearning.m, and a script to compute a (near-)optimal solution for the grid navigation problem: gridnav_nearoptsol.m.
- Assignment 3: Approximate dynamic programming (PDF), and the Matlab code. Deadline
**Thursday 5 May**. - Assignment 4: Online planning (PDF), and the Matlab code used as basis for the assignment. Deadline
**Tuesday 17 May**.

As announced early on, the lab discussion session is scheduled on May 19th from 6PM in room C01 (at the usual course time and location).
To ensure they can be graded, the final deadline for all assignments is also **Tuesday May 17th**; any student who at this date has not handed in 4 assignments is not eligible for the exam, and will not be received for the discussions. An assignment of students to timeslots will be published soon.

Comments, suggestions, questions etc. related to this course or website are welcome; please contact the lecturer.