Formal Verification of Probabilistic Deep Reinforcement Learning Policies with Abstract Training
Deep Reinforcement Learning (DRL), especially DRL with probabilistic policies, has shown great potential in learning control policies. In safety-critical domains, using probabilistic DRL policy requires strict safety assurances, making it critical to verify the probabilistic DRL policy formally. However, formal verification of probabilistic DRL policies still faces significant challenges. These challenges arise from the complexity of reasoning about the neural network’s probabilistic outputs for infinite state sets and the state explosion problem during model construction. This paper proposes a novel approach based on abstract training for quantitatively verifying probabilistic DRL policies. Specifically, we abstract the infinite continuous state space into finite discrete decision units and train a deep neural network (DNN) policy on these decision units. This abstract training allows for the direct black-box computation of probabilistic decision outputs for a set of states, greatly simplifying the complexity of reasoning neural network outputs. We further abstract the execution of the trained DNN policy as a Markov decision model (MDP) and perform probabilistic model checking, obtaining two types of upper bounds on the probability of being unsafe. When constructing the MDP, we incorporate the reuse of abstract states based on decision units, significantly alleviating the state explosion problem. Experiments show that the proposed probabilistic quantitative verification can yield tighter upper bounds on unsafe probabilities over longer time horizons more easily and efficiently than the current state-of-the-art method.
Tue 21 JanDisplayed time zone: Mountain Time (US & Canada) change
11:00 - 12:30 | |||
11:00 30mTalk | A Real-Blasting Extension of Cvc5 for Reasoning about Floating-Point Arithmetic VMCAI 2025 Daisuke Ishii JAIST | ||
11:30 30mTalk | Formally Verifiable Generated ASN.1/ACN Encoders and Decoders: A Case Study VMCAI 2025 Mario Bucev EPFL, Samuel Chassot EPFL, LARA, Simon Felix Ateleris GmbH, Filip Schramka Ateleris GmbH, Viktor Kunčak EPFL, Switzerland Pre-print | ||
12:00 30mTalk | Formal Verification of Probabilistic Deep Reinforcement Learning Policies with Abstract Training VMCAI 2025 Junfeng Yang Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Junfeng Yang Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Xin Chen University of New Mexico, USA, Qin Li Shanghai Key Laboratory of Trustworthy Computing, East China Normal University |