Verifiably Safe Off-Model Reinforcement Learning (TACAS 2019)

Sat 6 - Thu 11 April 2019 Prague, Czech Republic

Who

Nathan Fulton, André Platzer

Track

TACAS 2019

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 9 Apr 2019 14:30 - 15:00 at JUPITER - Machine Learning Chair(s): Bernhard Steffen

Abstract

The possibility of using reinforcement learning in safety-critical settings have inspired several recent approaches toward obtaining formal safety guarantees for learning algorithms. Existing formal methods for learning and optimization primarily consider the problem of constrained learning or constrained optimization. Given a single correct model and associated safety constraint, these approaches guarantee efficient learning while provably avoiding behaviors outside the safety constraint. Acting well given an accurate environmental model is an important pre-requisite for safe learning, but is ultimately insufficient for systems that operate in complex heterogeneous environments.

This paper introduces verification-preserving model updates, the first approach toward obtaining formal safety guarantees for reinforcement learning in heterogeneous environments. Verification-preserving model updates (VPMUs) are proof-preserving manipulations to hybrid programs and their associated safety constraints. At design-time, VPMUs provide a framework for combining inductive and deductive program synthesis of provably safe hybrid dynamical systems. We also introduce model update learning, a family of reinforcement learning algorithms that leverage VPMUs to efficiently select accurate environmental models at runtime. VPMUs and model update learning provide the first approach toward obtaining formal proofs obtained via reinforcement learning algorithms even when an accurate environmental model is not available.

Link to Publication

https://link.springer.com/chapter/10.1007/978-3-030-17462-0_28

Nathan Fulton

MIT-IBM Watson AI Lab

United States

André Platzer

Carnegie Mellon University