Exploration–Exploitation Dilemma in Reinforcement Learning: Calibrating Optimism in the Face of Uncertainty
Date: Monday, 15th September 2025
Time: 4:00 – 5:00 PM
Venue: A007, R&D Block
Title: Exploration–Exploitation Dilemma in Reinforcement Learning: Calibrating Optimism in the Face of Uncertainty
Abstract:
In Reinforcement Learning (RL), we study sequential decision-making problems under uncertainty and partial information. The exploration–exploitation dilemma lies at the core of RL algorithms—whether to collect new information or exploit existing information to maximise a desired objective. This talk will focus on the Markov Decision Process (MDP) formulation of RL problems, exploring two algorithm design paradigms to address this dilemma: frequentist algorithms with optimistic indices and Bayesian algorithms with posterior sampling. Dr. Basu will present a historical overview of these approaches, their theoretical analyses, and conclude with recent work on Langevin sampling-based RL algorithms (arXiv:2412.20824
) that achieve both theoretical guarantees and practical efficiency, even in deep RL settings, thus bridging the theory-to-practice gap.
Bio:
Dr. Debabrota Basu is a faculty member at Inria, France, where he leads research on robust, private, and fair machine learning. He previously worked as a Postdoctoral Research Fellow at Chalmers University of Technology, Sweden, and as a Research Fellow at the National University of Singapore, where he also earned his PhD in Computer Science. His work lies at the intersection of machine learning, privacy, and algorithmic fairness.