On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations
SE for AI


Deep Reinforcement Learning (DRL) is a paradigm of artificial intelligence where an agent uses a neural network to learn which actions to take in a given environment. DRL has recently gained traction from being able to solve complex environments like driving simulators, 3D robotic control, and multiplayer-online-battle-arena video games. Numerous implementations of the state-of-the-art algorithms responsible for training these agents, like the Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) algorithms, currently exist. However, studies make the mistake of assuming implementations of the same algorithm to be consistent and thus, interchangeable. In this paper, through a differential testing lens, we present the results of studying the extent of implementation inconsistencies, their effect on the implementations’ performance, as well as their impact on the conclusions of prior studies under the assumption of interchangeable implementations. The outcome of our differential tests showed significant discrepancies between the tested algorithm implementations, indicating that they are not interchangeable. In particular, out of the five PPO implementations tested on 56 games, three implementations achieved superhuman performance for 50% of their total trials while the other two implementations only achieved superhuman performance for less than 15% of their total trials. Furthermore, the performance among the high-performing PPO implementations was found to differ significantly in nine games. As part of a meticulous manual analysis of the implementations’ source code, we analyzed implementation discrepancies and determined that code-level inconsistencies primarily caused these discrepancies. Lastly, we replicated a study and showed that this assumption of implementation interchangeability was sufficient to flip experiment outcomes. Therefore, this calls for a shift in how implementations are being used. In addition, we recommend for (1) replicability studies for studies mistakenly assuming implementation interchangeability, (2) DRL researchers and practitioners to adopt the differential testing methodology proposed in this paper to combat implementation inconsistencies, and (3) the use of large environment suites.
Rajdeep's Slides (Mistaken Assumption.pdf) | 1.74MiB |
Fri 2 MayDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | SE for AI with Quality 1Research Track at 215 Chair(s): Chris Poskitt Singapore Management University | ||
11:00 15mTalk | A Tale of Two DL Cities: When Library Tests Meet CompilerSE for AI Research Track Qingchao Shen Tianjin University, Yongqiang Tian , Haoyang Ma Hong Kong University of Science and Technology, Junjie Chen Tianjin University, Lili Huang College of Intelligence and Computing, Tianjin University, Ruifeng Fu Tianjin University, Shing-Chi Cheung Hong Kong University of Science and Technology, Zan Wang Tianjin University | ||
11:15 15mTalk | Iterative Generation of Adversarial Example for Deep Code ModelsSE for AIAward Winner Research Track | ||
11:30 15mTalk | On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning ImplementationsSE for AI Research Track Rajdeep Singh Hundal National University of Singapore, Yan Xiao Sun Yat-sen University, Xiaochun Cao Sun Yat-Sen University, Jin Song Dong National University of Singapore, Manuel Rigger National University of Singapore Pre-print Media Attached File Attached | ||
11:45 15mTalk | µPRL: a Mutation Testing Pipeline for Deep Reinforcement Learning based on Real FaultsSE for AI Research Track Deepak-George Thomas Tulane University, Matteo Biagiola Università della Svizzera italiana, Nargiz Humbatova Università della Svizzera italiana, Mohammad Wardat Oakland University, USA, Gunel Jahangirova King's College London, Hridesh Rajan Tulane University, Paolo Tonella USI Lugano Pre-print | ||
12:00 15mTalk | Testing and Understanding Deviation Behaviors in FHE-hardened Machine Learning ModelsSE for AI Research Track Yiteng Peng Hong Kong University of Science and Technology, Daoyuan Wu Hong Kong University of Science and Technology, Zhibo Liu Hong Kong University of Science and Technology, Dongwei Xiao Hong Kong University of Science and Technology, Zhenlan Ji The Hong Kong University of Science and Technology, Juergen Rahmel HSBC, Shuai Wang Hong Kong University of Science and Technology | ||
12:15 15mTalk | TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron ProvenanceSE for AI Research Track Pre-print |