Debugging and Runtime Analysis of Neural Networks with VLMs (A Case Study) (CAIN 2025 - Research and Experience Papers)

Who

Boyue Caroline Hu, Divya Gopinath, Ravi Mangal, Nina Narodytska, Corina S. Păsăreanu, Susmit Jha

Track

CAIN 2025 Research and Experience Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 14:40 - 14:55 at 208 - Quality Assurance for AI systems Chair(s): Eduardo Santana de Almeida

Abstract

Debugging of Deep Neural Networks (DNNs), particularly vision models, is very challenging due to the complex and opaque decision-making processes in these networks. In this paper, we explore multi-modal Vision-Language Models (VLMs), such as CLIP, to automatically interpret the opaque representation space of vision models using natural language. This in turn, enables a semantic analysis of model behavior using human-understandable concepts, without requiring costly human annotations. Key to our approach is the notion of semantic heatmap, that succinctly captures the statistical properties of DNNs in terms of the concepts discovered with the VLM and that are computed off-line using a held-out data set. We show the utility of semantic heatmaps for fault localization – an essential step in debugging – in vision models. Our proposed technique helps localize the fault in the network (encoder vs head) and also highlights the responsible high-level concepts, by leveraging novel differential heatmaps, which summarize the semantic differences between the correct and incorrect behavior of the analyzed DNN. We further propose a lightweight runtime analysis to detect and filter-out defects at runtime, thus improving the reliability of the analyzed DNNs. The runtime analysis works by measuring and comparing the similarity between the heatmap computed for a new (unseen) input and the heatmaps computed a-priori for correct vs incorrect DNN behavior. We consider two types of defects: misclassifications and vulnerabilities to adversarial attacks. We demonstrate the debugging and runtime analysis on a case study involving a complex ResNet-based classifier trained on the RIVAL-10 dataset.

Boyue Caroline Hu

University of Toronto

Canada

Divya Gopinath

KBR; NASA Ames

United States

Ravi Mangal

Colorado State University

United States

Nina Narodytska

VMware Research

United States

Corina S. Păsăreanu

Carnegie Mellon University

United States

Susmit Jha

SRI

United States

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 28 Apr
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	Quality Assurance for AI systemsResearch and Experience Papers at 208 Chair(s): Eduardo Santana de Almeida Federal University of Bahia

14:00 10m Talk		Towards a Domain-Specific Modeling Language for Streamlined Change Management in AI Systems Development Research and Experience Papers Razan Abualsaud IRIT, CNRS, Toulouse
14:10 15m Talk		An AI-driven Requirements Engineering Framework Tailored for Evaluating AI-Based Software Research and Experience Papers Hamed Barzamini , Fatemeh Nazaritiji Northern Illinois University, Annalise Brockmann Northern Illinois University, Hasan Ferdowsi Northern Illinois university, Mona Rahimi Northern Illinois University
14:25 15m Talk		MLScent: A tool for Anti-pattern detection in ML projects Research and Experience Papers Karthik Shivashankar University of Oslo, Antonio Martini University of Oslo
14:40 15m Talk		Debugging and Runtime Analysis of Neural Networks with VLMs (A Case Study)Distinguished paper Award Candidate Research and Experience Papers Boyue Caroline Hu University of Toronto, Divya Gopinath KBR; NASA Ames, Ravi Mangal Colorado State University, Nina Narodytska VMware Research, Corina S. Păsăreanu Carnegie Mellon University, Susmit Jha SRI
14:55 15m Talk		Investigating Issues that Lead to Code Technical Debt in Machine Learning Systems Research and Experience Papers Rodrigo Ximenes Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Antonio Pedro Santos Alves Pontifical Catholic University of Rio de Janeiro, Tatiana Escovedo Pontifical Catholic University of Rio de Janeiro, Rodrigo Spinola Virginia Commonwealth University, Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio) Pre-print
15:10 10m Talk		Addressing Quality Challenges in Deep Learning: The Role of MLOps and Domain Knowledge Research and Experience Papers Santiago del Rey Universitat Politècnica De Catalunya - Barcelona Tech, Adrià Medina Universitat Politècnica de Barcelona - BarcelonaTech (UPC), Xavier Franch Universitat Politècnica de Catalunya, Silverio Martínez-Fernández UPC-BarcelonaTech Pre-print
15:20 10m Other		Discussion Research and Experience Papers