Locally Pareto-Optimal Interpretations for Black-Box Machine Learning Models (ATVA 2025 - ATVA Papers)

Who

Aniruddha Joshi , Supratik Chakraborty, S. Akshay, Shetal Shah , Hazem Torfah, Sanjit Seshia

Track

ATVA 2025 ATVA Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+05:30) Chennai, Kolkata, Mumbai, New Delhi.

Use conference time zone: (GMT+05:30) Chennai, Kolkata, Mumbai, New DelhiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 30 Oct 2025 11:30 - 12:00 at R102 - Learning Chair(s): Kittiphon Phalakarn

Abstract

Creating meaningful interpretations for black-box machine learning models involves balancing two often conflicting objectives: ac- curacy and explainability. Exploring the trade-off between these objec- tives is essential for developing trustworthy interpretations. While many techniques for multi-objective interpretation synthesis have been devel- oped, they typically lack formal guarantees on the Pareto-optimality of the results. Methods that do provide such guarantees, on the other hand, often face severe scalability limitations when exploring the Pareto- optimal space. To address this, we develop a framework based on local optimality guarantees that enables more scalable synthesis of interpre- tations. Specifically, we consider the problem of synthesizing a set of Pareto-optimal interpretations with local optimality guarantees, within the immediate neighborhood of each solution. Our approach begins with a multi-objective learning or search technique, such as Multi-Objective Monte Carlo Tree Search, to generate a best-effort set of Pareto-optimal solutions with respect to accuracy and explainability. We then verify local optimality for each candidate as a Boolean satisfiability problem, which we solve using a SAT solver. We demonstrate the efficacy of our approach on a set of benchmarks, comparing it against previous methods for exploring the Pareto-optimal front of interpretations. In particular, we show that our approach yields interpretations that closely match those synthesized by methods offering global guarantees.

Aniruddha Joshi

UC Berkeley

United States

Supratik Chakraborty

IIT Bombay

India

S. Akshay

Shetal Shah

IIT Bombay, India

India

Hazem Torfah

Chalmers University of Technology

Sweden

Sanjit Seshia