Symbolic State Seeding Improves Coverage Of Reinforcement Learning
FULL
Due to a limited learning budget, a reinforcement learning agent can only explore the most probable scenarios out of a potentially rich and complex environment dynamics. This may result in a limited understanding of the context and low robustness of the learned policy. A possible approach to address this problem is to explore the interactions between an autonomous agent and environment in rare but important situations. We proposes SymSeed, a method for initializing learning episodes for the class of reinforcement learning problems for which a simulation environment (model) is available. This increases the chance of exposing the agent to interesting states during learning. Inspired by techniques for increasing coverage in testing of software, we analyze the simulator implementation using symbolic execution. Then we generate initial states that ensure the agent explores the simulator dynamics well during learning. We evaluate SymSeed by feeding the generated states into well-known reinforcement learning algorithms, both tabular and approximating methods, including vanilla Q-Learning, DQN, PPO, A3C, SAC, TD3, and CAT-RL. In all test cases, the combination of SymSeed with uniform sampling from the entire state space enables all algorithms to achieve faster convergence and higher success rates than the baseline. The effect is particularly strong in presence of sparse rewards or local optima.
Mon 28 AprDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | Session 2: FoundationsArtifact Track / Research Track at 204 Chair(s): Sona Ghahremani Hasso Plattner Institute, University of Potsdam | ||
11:00 25mTalk | Symbolic State Seeding Improves Coverage Of Reinforcement LearningFULL Research Track Mohsen Ghaffari IT University of Copenhagen, Cong Chen IT-University of Copenhagen, Mahsa Varshosaz IT University of Copenhagen, Denmark, Einar Broch Johnsen University of Oslo, Andrzej Wąsowski IT University of Copenhagen, Denmark | ||
11:25 25mTalk | Robust Probabilistic Model Checking with Continuous Reward DomainsFULLBest Student Paper Award Research Track Xiaotong Ji Imperial College London, Hanchun Wang Imperial College London, Antonio Filieri AWS and Imperial College London, Ilenia Epifani Politecnico di Milano | ||
11:50 15mTalk | A Comprehensive Analysis of Cybersecurity Challenges in Self-Adaptive Avionics: A Plug&Fly Avionics Platform Case StudySHORT Research Track Aisha Zahid Junejo Universitat Stuttgart, Mario Werthwein Universitat Stuttgart, Bjoern Annighoefer University of Stuttgart | ||
12:05 15mTalk | ResMetric: Analyzing Resilience to Enable Research on AntifragilityARTIFACT Artifact Track Ferdinand Koenig Humboldt-Universität zu Berlin, Marc Carwehl Humboldt-Universität zu Berlin, Calum Imrie University of York | ||
12:20 10mOther | Discussion Session 2 Research Track |