Structured Multi-Armed Bandits
Andrea Tirinzoni
Deib PhD student
DEIB - Alario Seminar Room (building 21)
September 13th, 2019
11.30 am
Contacts:
Marcello Restelli
Research Line:
Artificial intelligence and robotics
Deib PhD student
DEIB - Alario Seminar Room (building 21)
September 13th, 2019
11.30 am
Contacts:
Marcello Restelli
Research Line:
Artificial intelligence and robotics
Abstract
The multi-armed bandit (MAB) problem is perhaps the simplest reinforcement learning setting where the exploration-exploitation dilemma arises, with agents trading off between exploring actions (aka arms) with uncertain effects and exploiting those that appear to be the most rewarding. The classic MAB setting, in which the arm payoffs are uncorrelated, is now theoretically well-understood; lower bounds on the performance of any strategy have been established and algorithms with optimal finite-time guarantees are available. This has led to numerous applications, ranging from recommender systems to healthcare and finance. However, most practical problems exhibit known structural properties, i.e, the expected reward of one arm may depend on the return of other arms. In such problems, classic bandit strategies become uselessly too conservative. Designing algorithms that exploit this structure to reduce the amount of exploration is a fundamental open problem for scaling bandit algorithms to complex domains. In this talk, I will review recent advancements in the structured MAB field. I will present algorithms for specific parametric structures (such as the linear one) and algorithms for general structures (i.e., when an arbitrary mapping between bandit instances and returns is available). I will also present lower bounds that characterize the fundamental performance limits that any algorithm must suffer in the structured case and shed light on how to design optimal strategies. Finally, I will discuss the main open questions and provide directions for future research.