The ODE Method for Algorithm Design in Reinforcement Learning

Sommario
On July 10th, 2025 at 2.30 pm Federico Corso, PHD Student in Information Technology, will hold a seminar on "The ODE Method for Algorithm Design in Reinforcement Learning" at DEIB Seminar Room "Alessandra Alario" (Building 21).
In Reinforcement Learning and Optimal Control, an algorithm is a finite sequence of computer-implementable instructions designed to compute or approximate a policy, its performance, a value function, or related quantities. In algorithm design, it can be helpful to discard the constraints of computers and imagine they can operate with infinite clock speed.
In such an idealized setting, we can think of an algorithm as an ordinary differential equation (ODE). In this way, the richer theory of ODE stability can be used to assess and design the convergence properties of algorithms more easily than in discrete time. An implementable, discrete-time recursive rule can then be obtained by suitable discretization techniques.
In this seminar, the ODE method will be surveyed in the context of Stochastic Approximation and then specifically applied to tame the slow and potentially unstable dynamics of Watkins’ Q-learning algorithm, leading to faster convergence properties and improved numerical stability.
In Reinforcement Learning and Optimal Control, an algorithm is a finite sequence of computer-implementable instructions designed to compute or approximate a policy, its performance, a value function, or related quantities. In algorithm design, it can be helpful to discard the constraints of computers and imagine they can operate with infinite clock speed.
In such an idealized setting, we can think of an algorithm as an ordinary differential equation (ODE). In this way, the richer theory of ODE stability can be used to assess and design the convergence properties of algorithms more easily than in discrete time. An implementable, discrete-time recursive rule can then be obtained by suitable discretization techniques.
In this seminar, the ODE method will be surveyed in the context of Stochastic Approximation and then specifically applied to tame the slow and potentially unstable dynamics of Watkins’ Q-learning algorithm, leading to faster convergence properties and improved numerical stability.