Events | Infosys Centre for AI

Exploration–Exploitation Dilemma in Reinforcement Learning: Calibrating Optimism in the Face of Uncertainty

Bio: Debabrota Basu is a faculty researcher at Inria (University of Lille, France), specializing in responsible AI, machine learning, and optimization. His work focuses on developing robust, fair, and privacy-preserving algorithms with applications in agro-ecology, medicine, and autonomous systems. He holds a PhD in Computer Science from the National University of Singapore and teaches postgraduate courses on privacy and machine learning at leading French institutions. Beyond academia, he enjoys writing, theatre, and engaging with social and scientific issues.

Abstract: In Reinforcement Learning (RL), we study the sequential decision making problems under uncertainty and partial information. Exploration-exploitation dilemma is at the core of designing RL algorithms. Specifically, it deals with the question: whether to collect new information at any point of interaction or to exploit the existing information to maximise a desired objective. In this talk, we specifically follow the Markov Decision Process (MDP) formulation of the RL problems. Then, we delve into the two algorithm design techniques to address the exploration-exploitation dilemma in MDPs: frequentist algorithms with optimistic indices, and Bayesian algorithms with posterior sampling. In both cases, we sketch a historical overview of development of these algorithmic paradigms, and also their theoretical analysis. We conclude with the recent work on Langevin sampling based RL algorithms (https://arxiv.org/abs/2412.20824) that simultaneously achieve theoretical performance guarantees and practical efficiency even n deep RL settings, and thus, take a step to bridge the theory-to-practice gap.

Notes

Authors

* External authors

Start Date

September 15, 2025

End Date

September 15, 2025

Organizer

Ranjitha Prasad