Formal Verification of Probabilistic Deep Reinforcement Learning Policies with Abstract Training (VMCAI 2025 - 26th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI 2025))

Who

Junfeng Yang, Xin Chen, Qin Li

Track

VMCAI 2025

Time Zone

The program is currently displayed in (GMT-07:00) Mountain Time (US & Canada).

Use conference time zone: (GMT-07:00) Mountain Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 21 Jan 2025 12:00 - 12:30 at Hopscotch - Verification Chair(s): Isabella Mastroeni

Abstract

Deep Reinforcement Learning (DRL), especially DRL with probabilistic policies, has shown great potential in learning control policies. In safety-critical domains, using probabilistic DRL policy requires strict safety assurances, making it critical to verify the probabilistic DRL policy formally. However, formal verification of probabilistic DRL policies still faces significant challenges. These challenges arise from the complexity of reasoning about the neural network’s probabilistic outputs for infinite state sets and the state explosion problem during model construction. This paper proposes a novel approach based on abstract training for quantitatively verifying probabilistic DRL policies. Specifically, we abstract the infinite continuous state space into finite discrete decision units and train a deep neural network (DNN) policy on these decision units. This abstract training allows for the direct black-box computation of probabilistic decision outputs for a set of states, greatly simplifying the complexity of reasoning neural network outputs. We further abstract the execution of the trained DNN policy as a Markov decision model (MDP) and perform probabilistic model checking, obtaining two types of upper bounds on the probability of being unsafe. When constructing the MDP, we incorporate the reuse of abstract states based on decision units, significantly alleviating the state explosion problem. Experiments show that the proposed probabilistic quantitative verification can yield tighter upper bounds on unsafe probabilities over longer time horizons more easily and efficiently than the current state-of-the-art method.

Junfeng Yang

Shanghai Key Laboratory of Trustworthy Computing, East China Normal University

China

Xin Chen

University of New Mexico, USA

United States

Qin Li