IIT Bombay Logo

IIT Bombay SCPP-IoE Workshop on

Building Trust in AI: Formal Reasoning for Reinforcement Learning, MDPs, and Probabilistic Models

Right Logo

Workshop Details


Date: December 9th, 4pm IST

Venue: CC 105, Computing Complex, IIT Bombay

About

As AI systems increasingly make critical decisions in uncertain and dynamic environments, ensuring their trustworthiness is essential. This short workshop focuses on leveraging formal reasoning techniques to enhance the reliability, safety, and interpretability of Reinforcement Learning (RL), Markov Decision Processes (MDPs), and probabilistic models. The talks will explore foundational methods for verifying system properties, addressing challenges in handling uncertainty, and integrating these approaches into real-world AI systems.

The workshop aims to start a conversation between formal methods and AI researchers towards fostering advancements in trustworthy decision-making.

Speakers

Speaker 1
Piyush Srivastava
TIFR, Mumbai
Website
Speaker 2
Shibashis Guha
TIFR, Mumbai
Website
Speaker 3
Suguman Bansal
Georgia Tech
Website
Speaker 4
Shivaram Kalyanakrishnan
IIT Bombay
Website

Registration

Participation is free, but registration at this link is mandatory for those attending in person (for managing logistics and refreshments).

The workshop itself will be hybrid, though in-person participation is highly encouraged. The Zoom link will be mailed to participants.

Schedule

4:00 - 4:45pm: "An invitation to causal inference" - Piyush Srivastava

Abstract: This talk will be an invitation to causal inference with graphical models, and specifically to robustness and stability issues that appear not to have received the attention they deserve. Most of the talk will be a survey leading to the formulation of these questions. Parts of the talk will be based on work/discussion with Leonard Schulman, Spencer Gordon, Vinayak Kumar, and Vidya Sagar Sharma.

Bio: Piyush Srivastava is interested in studying the interplay between probability, computer science, and Physics. He works at the School of Technology and Computer Science at the Tata Institute of Fundamental Research. He has been an associate of the Indian Academy of Science (2017-22) and an associate member of the International Center for Theoretical Sciences (ICTS-TIFR), Bengaluru, and is currently also an associate member of the Department of Theoretical Physics at TIFR.

4:45 - 5:30pm: "PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP" - Shibashis Guha

Abstract: Markov decision processes (MDP) and continuous-time MDP (CTMDP) are the fundamental models for non-deterministic systems with probabilistic uncertainty. Mean payoff (a.k.a. long-run average reward) is one of the most classic objectives considered in their context. We will discuss an algorithm to compute mean payoff probably approximately correctly in unknown MDP; further, we extend it to unknown CTMDP. We do not require any knowledge of the state space, only a lower bound on the minimum transition probability, which has been advocated in literature. Joint work with Chaitanya Agarwal, Jan Křetínský and M. Pazhamalai

Bio: Shibashis is a faculty member at the School of Technology and Computer Science, Tata Institute of Fundamental Research. His research interests include formal methods, in particular, reactive synthesis. Prior to joining TIFR, he was a postdoctoral researcher at Université Libre de Bruxelles, Belgium and at the Hebrew University of Jerusalem, Israel. He did his doctoral studies from IIT Delhi.

5:30 - 6:00pm: Tea/Coffee break

6:00 - 6:45pm: "Specification-Guided Reinforcement Learning" - Suguman Bansal

Abstract: Reinforcement Learning (RL) is being touted to revolutionize the way we design systems. However, a key challenge to reaching that holy grail comes from the lack of guarantees that the synthesized systems offer. Logic and formal reasoning can address some of these issues ... or can they?

In this talk, I will cover recent progress in using logical specifications in RL and discuss the challenges it faces moving forward.

Bio: Suguman Bansal is a is an assistant professor in the School of Computer Science at Georgia Institute of Technology. Her research interests lie at the intersection of Artificial Intelligence and Programming Languages. Specifically, she works on developing tools and techniques to improve the quality of automated verification and synthesis of computational systems. Her recent work concerns providing formal guarantees about learning-enabled systems with a focus on Reinforcement Learning.

She received her Ph.D. (2020) and M.S. (2016) in Computer Science from Rice University, and B.S. (with Honors) degree (2014) in Mathematics and Computer Science from Chennai Mathematical Institute. She is the recipient of the ATVA Best Paper Award 2023, Future Faculty Fellowship 2019, MIT EECS Rising Stars 2021, Andrew Ladd Fellowship 2016, and a Gold Medal at the ACM Student Research Competition at POPL 2016.

6:45 - 7:30pm: "Complexity of Policy Iteration" - Shivaram Kalyanakrishnan

Abstract: Markov Decision Problems (MDPs) are a well-studied abstraction of sequential decision making. Policy Iteration (PI) is a classical, widely-used family of algorithms to compute an optimal policy for a given MDP. PI is extremely efficient on MDPs typically encountered in practice. PI is also appealing from a theoretical standpoint, since it naturally yields "strong" running time bounds for MDP planning. Strong bounds depend only on the number of states and actions in the MDP, and not on additional parameters such as the discount factor and the size of the real-valued coefficients.

It has proven surprisingly difficult to establish tight theoretical upper bounds on the running time of PI. On MDPs with n states and 2 actions per state, the trivial upper bound on the number of iterations taken by PI is 2n. It was not until 1999---nearly four decades after the PI algorithm was first published---that this trivial bound was improved, and that by a mere linear factor (to O(2n/n)). In this talk, I will present a line of work that has yielded some improvements over existing bounds. I will also present some open problems in the area. The talk is expected to be widely accessible, since the analysis only uses basic ideas from discrete structures and algorithms.

Bio: Shivaram Kalyanakrishnan is an Associate Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology Bombay. His research interests include artificial intelligence and machine learning, spanning topics such as sequential decision making, multi-agent learning, multi-armed bandits, and humanoid robotics. Kalyanakrishnan received a Ph.D. in computer science from the University of Texas at Austin. Subsequently he was a Research Scientist at Yahoo Labs Bangalore and an INSPIRE Faculty Fellow at the Indian Institute of Science, Bangalore. His contributions to robot soccer have received two Best Student Paper awards at the annual RoboCup competitions. Kalyanakrishnan was also a member of the first study panel of the One Hundred Year Study on Artificial Intelligence (AI100), which in 2016 released its report titled "Artificial Intelligence and Life in 2030".