Control prin invatare 2026 / Learning control 2026

Lecturer: Lucian Busoniu. TA: Stefan Pirje.

Navigation: [Lecture slides|Labs|Contact] [Back to the lecturer's webpage]

About this course

This course provides methods for controlling systems that are too complex or insufficiently known to apply classical control design techniques. The focus is placed on learning algorithms for control, in particular reinforcement learning (RL). Attention is also paid to model-based techniques related to RL, as they can be very useful in controlling complex systems even when a model is known. After introducing the RL problem, the dynamic programming algorithms that sit at the foundation of RL are described in the discrete-variable context. Then, classical RL algorithms are introduced in the same context. In the second part of the course, the dynamical programming and RL algorithms are extended with approximation techniques, in order to make them applicable to continuous-variable control, as well as to large-scale discrete-variable problems. We dedicate significant space to deep reinforcement learning techniques.

This course is part of the Master program ICAF of the Automation Department, UTCluj (1st year 2nd semester). As prerequisites, basic knowledge of analysis and linear algebra is needed, together with notions of discrete-time dynamical systems. The teacher responsible is Lucian Busoniu.

A detailed schedule is given next (note that things may still change, so check back from time to time). Lectures take place in D01 and labs in C01, on Dorobantilor 71-73.

Image with schedule table

Grading rules:

50% exam.
50% labs.
10% lecture quizzes.

Lectures

The slides are made available here in time for each lecture. The slides are required material for the exam. They, as well as the lectures, are in Romanian.

Part 1: The reinforcement learning problem.
Part 2: The optimal solution. Dynamic programming. You may also download the code for the demos in parts 2 and 3.
Part 3: Reinforcement learning.
Part 4: Function approximation. Approximate dynamic programming. Offline, batch RL.
Part 5: Online approximate RL.
Part 6: Neural networks (these slides are in English).
Part 7: Deep reinforcement learning (updated 18 May 2026).

At the end of each lecture, a quiz will be given from the material discussed in that lecture. At the end of the semester, each student obtains a number of points in the grade, equal to the number of questions answered correctly divided by the total number of questions asked during the semester.

Labs

In the lab classes, a set of assignments must be solved. A solution consists of a brief report in PDF and associated code, and must be submitted by a specified deadline. For each lab, the full code or a specified part of it should be completed during the lab session itself. Each lab is graded up to 10, reduced to 5 if handed in late. The labs are required to participate in the exam.

There is zero tolerance for copying; any copied solution means immediate forfeiture of the discipline for this year.

A discussion session with mandatory participation will be organized before the exam, where the lab TA will discuss the solutions separately with each student group. In this session, detailed questions will be asked to clearly assess whether the assignment solution is original, and the contribution of each student to this solution.

Assignment 1: Markov decision processes. Q-iteration and policy iteration (colab notebook).
Assignment 2: Q-learning (colab notebook).
Assignment 3: Deep neural networks (colab notebook).

Contact

Comments, suggestions, questions etc. related to this course or website are welcome; please contact the lecturer.