Paper
Event
CAI Seminar Series

Visual Discovery for Science

Abstract:
From social media to satellite images, we are capturing visual data at an unprecedented scale. These images tell a story about our planet. With advances in automatic recognition, we can build a collective understanding of world-scale events as recorded through visual media.

To achieve these goals in different domains, we need label-efficient vision and multimodal foundation models that are interpretable and trustworthy. In this talk, we will first look at an annotation-efficient method for building a multimodal vision-language model in the scientific domain of remote sensing, where language annotations are sparse. Then, I will present my recent research on building interpretable models for scientific discovery using such black-box vision-language models. Finally, we will discuss open challenges in this area through datasets proposed in my work, aimed at enabling and evaluating scientific discovery.

Bio:
Dr. Utkarsh Mall is an Assistant Professor of Computer Vision at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in Abu Dhabi, UAE. His research focuses on building interpretable, reliable, and data-efficient methods for making novel scientific discoveries from visual data.

He has applied his research in fields such as agriculture, anthropology, archaeology, urban planning, public health, and climate science. Prior to joining MBZUAI, he was a postdoctoral researcher at Columbia University. He completed his PhD at Cornell University, where he worked on label-efficient foundation models for scientific domains and their use in unsupervised scientific discoveries.

He also co-organizes the CVPR workshop Computer Vision for Science (CV4Science). His doctoral research was selected for the WACV 2022 and CVPR 2023 Doctoral Consortium.