DEIB PhD student
DEIB - PT1 Room (building 20 - ground floor)
December 13th, 2017
12.00 pm
Contacts:
Nicola Gatti
Research Line:
Artificial intelligence and robotics
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such estimation errors are common, whether they affect performance and how they can be prevented. In this talk, we answer to these questions focusing on DQN algorithm, which combines Q-learning with a deep neural network and on its two variations which reduce bias and variance error of the Q-function estimation. Firstly, we show how the idea behind Double Q-learning algorithm can be used to reduce the Q-function estimation bias also in Deep RL scenarios. Then, we show Averaged- DQN algorithm, an extension of DQN that leads to a more stable training procedure and improved performance by reducing approximation error variance in the target values.
This talk will discuss the work and results presented by Prof. Hado Van Hasselt (DeepMind) at the ACAI Summer School on Reinforcement Learning (9 October 2017 - Nieuwpoort).