Symbolic State Seeding Improves Coverage Of Reinforcement Learning (SEAMS 2025 - Research Track)

Who

Mohsen Ghaffari, Cong Chen, Mahsa Varshosaz, Einar Broch Johnsen, Andrzej Wąsowski

Track

SEAMS 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 11:00 - 11:25 at 204 - Session 2: Foundations Chair(s): Sona Ghahremani

Abstract

Due to a limited learning budget, a reinforcement learning agent can only explore the most probable scenarios out of a potentially rich and complex environment dynamics. This may result in a limited understanding of the context and low robustness of the learned policy. A possible approach to address this problem is to explore the interactions between an autonomous agent and environment in rare but important situations. We proposes SymSeed, a method for initializing learning episodes for the class of reinforcement learning problems for which a simulation environment (model) is available. This increases the chance of exposing the agent to interesting states during learning. Inspired by techniques for increasing coverage in testing of software, we analyze the simulator implementation using symbolic execution. Then we generate initial states that ensure the agent explores the simulator dynamics well during learning. We evaluate SymSeed by feeding the generated states into well-known reinforcement learning algorithms, both tabular and approximating methods, including vanilla Q-Learning, DQN, PPO, A3C, SAC, TD3, and CAT-RL. In all test cases, the combination of SymSeed with uniform sampling from the entire state space enables all algorithms to achieve faster convergence and higher success rates than the baseline. The effect is particularly strong in presence of sparse rewards or local optima.

Mohsen Ghaffari

IT University of Copenhagen

Denmark

Cong Chen

IT-University of Copenhagen

Mahsa Varshosaz

IT University of Copenhagen, Denmark

Denmark

Einar Broch Johnsen

University of Oslo

Norway

Andrzej Wąsowski

IT University of Copenhagen, Denmark

Denmark

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 28 Apr
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Session 2: FoundationsArtifact Track / Research Track at 204 Chair(s): Sona Ghahremani Hasso Plattner Institute, University of Potsdam

11:00 25m Talk		Symbolic State Seeding Improves Coverage Of Reinforcement LearningFULL Research Track Mohsen Ghaffari IT University of Copenhagen, Cong Chen IT-University of Copenhagen, Mahsa Varshosaz IT University of Copenhagen, Denmark, Einar Broch Johnsen University of Oslo, Andrzej Wąsowski IT University of Copenhagen, Denmark
11:25 25m Talk		Robust Probabilistic Model Checking with Continuous Reward DomainsFULLBest Student Paper Award Research Track Xiaotong Ji Imperial College London, Hanchun Wang Imperial College London, Antonio Filieri AWS and Imperial College London, Ilenia Epifani Politecnico di Milano
11:50 15m Talk		A Comprehensive Analysis of Cybersecurity Challenges in Self-Adaptive Avionics: A Plug&Fly Avionics Platform Case StudySHORT Research Track Aisha Zahid Junejo Universitat Stuttgart, Mario Werthwein Universitat Stuttgart, Bjoern Annighoefer University of Stuttgart
12:05 15m Talk		ResMetric: Analyzing Resilience to Enable Research on AntifragilityARTIFACT Artifact Track Ferdinand Koenig Humboldt-Universität zu Berlin, Marc Carwehl Humboldt-Universität zu Berlin, Calum Imrie University of York
12:20 10m Other		Discussion Session 2 Research Track