Efficient Reinforcement Learning with Generalized-Reactivity Specifications
Reinforcement learning has been used to solve sequential decision-making problems in intelligent systems. However, current RL approaches suffer from slow convergence and reward sparsity, and its reward mechanism is challenging to deal with complex task specifications. Temporal logic can describe non-Markovian task specifications, the synthesized strategy of which could be used as a priori knowledge to train the agents to interact with the environment efficiently. This paper considers the intelligent agent reacts to the environment with a high-level reactive temporal logic specification called Generalized Reactivity of rank 1 (GR(1)). We first use the synthesized strategy of GR(1) to construct the Markov Decision Process with a potential-based reward machine, which integrates the environment with high-level reactive temporal specifications. Then we developed a topological-sort-based reward shaping approach to calculate the potential functions of the reward machine, based on which we used Q-learning to train the agents. Experiments on multi-task learning show that the proposed approach outperforms the state-of-art algorithms in learning rate and optimal rewards. Also, compared with the value-iteration-based reward shaping approaches, our topological-sort-based reward shaping approach could handle the cases where the synthesized strategies are in the form of directed cyclic graphs.
Fri 9 DecDisplayed time zone: Osaka, Sapporo, Tokyo change
13:00 - 14:00 | Machine Learning 3Technical Track at Room2 Chair(s): Atul Gupta Indian Institute of Information Technology, Design and Manufacturing (IIITDM) | ||
13:00 20mPaper | Efficient Reinforcement Learning with Generalized-Reactivity Specifications Technical Track Chenyang Zhu , Yujie Cai Changzhou University, Can Hu changzhou university, Jia Bi University of Southampton | ||
13:20 20mPaper | Adversarial Deep Reinforcement Learning for Improving the Robustness of Multi-agent Autonomous Driving Policies Technical Track | ||
13:40 20mPaper | DronLomaly: Runtime Detection of Anomalous Drone Behaviors via Log Analysis and Deep Learning Technical Track Lwin Khin Shar Singapore Management University, Wei Minn Singapore Management University, Duong Ta Singapore Management University, Jiani Fan Nanyang Technological University, Lingxiao Jiang Singapore Management University, Daniel Lim Wai Kiat Singapore Management University |