University of Waterloo

Towards Scaling Multi-Agent Reinforcement Learning

Dr. Sriram Ganapathi Subramanian

November 14th, 2024, 11am-12pm, EC4-2101A

Sequential decision making in the real-world involves reasoning about and responding to multiple interacting agents in a dynamic environment. Multi-agent reinforcement learning (MARL) is an emerging field of machine learning that aims to learn policies for such environments and has seen much success in the past decade. However, MARL is yet to find wide application in large-scale real-world problems due to two important reasons. First, MARL algorithms have poor sample efficiency, where many data samples need to be obtained to learn meaningful policies, even in small environments. Second, MARL algorithms are not scalable to environments with many agents since, typically, these algorithms are exponential in the number of agents in the environment. In this talk, I will describe critical aspects of our research that addresses both MARL challenges. Towards improving sample efficiency, an important observation is that many real-world environments already deploy sub-optimal or heuristic approaches for generating policies. To this end, we propose a principled framework for accelerating MARL training using such pre-existing solutions and show its effectiveness both theoretically and empirically. Towards scaling MARL, we explore the use of mean field theory, which abstracts other agents in the environment by a single virtual agent. Subsequently, we combine our work in mean field learning and learning from pre-existing knowledge to show that we can achieve powerful MARL algorithms that are more suitable for large real-world environments as compared to prior approaches. In this talk, I will describe real-world applications of our work in domains spanning autonomous driving, robotics, fighting wildland fires and ride-pool matching problems in addition to classic video games.