ICSE 2023 (series) / ICPC 2023 (series) / Replications and Negative Results (RENE) /
Revisiting Deep Learning for Variable Type Recovery
Tue 16 May 2023 14:12 - 14:17 at Meeting Room 106 - Programming Languages, Types, and Complexity Chair(s): Vittoria Nardone
Compiled binary executables are often the only available artifact in reverse engineering, malware analysis, or maintenance of software systems. Unfortunately, the lack of semantic information like variable names makes comprehending binaries difficult. In efforts to improve the comprehensibility of binaries, researchers have recently used machine learning techniques to predict semantic information contained in the original source code. Chen et al. implemented DIRTY, a Transformer-based Encoder-Decoder architecture capable of augmenting decompiled code with variable names and types by leveraging decompiler output tokens and variable size information. Chen et al. were able to demonstrate a substantial increase in name and type extraction accuracy on Hex-Rays decomiler outputs compared to existing static analysis and AI-based techniques. We extend the original DIRTY results by re-training the DIRTY model on a dataset produced by the open-source Ghidra decompiler. Although Chen et al. concluded that Ghidra was not a suitable decompiler candidate due to its difficulty in parsing DWARF, we demonstrate that straightforward parsing of variable data generated by Ghidra results in similar retyping performance. We hope this work inspires further interest and adoption of the Ghidra decompiler for use in research projects.
Tue 16 MayDisplayed time zone: Hobart change
Tue 16 May
Displayed time zone: Hobart change
13:45 - 15:15 | Programming Languages, Types, and ComplexityDiscussion / Research / Replications and Negative Results (RENE) / Journal First at Meeting Room 106 Chair(s): Vittoria Nardone | ||
13:45 9mFull-paper | How Well Static Type Checkers Work with Gradual Typing? A Case Study on Python Research Wenjie Xu Nanjing University, Lin Chen Nanjing University, Chenghao Su Nanjing University, Yimeng Guo Nanjing University, Yanhui Li Nanjing University, Yuming Zhou Nanjing University, Baowen Xu Nanjing University | ||
13:54 9mFull-paper | Too Simple? Notions of Task Complexity used in Maintenance-based Studies of Programming Tools Research Patrick Rein University of Potsdam; Hasso Plattner Institute, Tom Beckmann Hasso Plattner Institute, Eva Krebs Hasso Plattner Institute (HPI), University of Potsdam, Germany, Toni Mattis University of Potsdam; Hasso Plattner Institute, Robert Hirschfeld University of Potsdam; Hasso Plattner Institute | ||
14:03 9mFull-paper | Path Complexity Predicts Code Comprehension Effort Research Sofiane Dissem Harvey Mudd College, Eli Pregerson Harvey Mudd College, Adi Bhargava Harvey Mudd College, Josh Cordova Harvey Mudd College, Lucas Bang Harvey Mudd College | ||
14:12 5mShort-paper | Revisiting Deep Learning for Variable Type Recovery Replications and Negative Results (RENE) Pre-print | ||
14:17 9mTalk | Programming language implementations for context-oriented self-adaptive systems Journal First Nicolás Cardozo Universidad de los Andes, Kim Mens Université catholique de Louvain, ICTEAM institute, Belgium Link to publication DOI Media Attached | ||
14:26 9mFull-paper | Improving Code Search with Multi-Modal Momentum Contrastive Learning Research Zejian Shi Fudan University, Yun Xiong Fudan University, Yao Zhang Fudan University, Zhijie Jiang National University of Defense Technology, Jinjing Zhao National Key Laboratory of Science and Technology on Information System Security, Lei Wang National University of Defense Technology, Shanshan Li National University of Defense Technology Pre-print | ||
14:35 9mFull-paper | Revisiting Lightweight Compiler Provenance Recovery on ARM Binaries Replications and Negative Results (RENE) Pre-print | ||
14:44 31mPanel | Discussion 7 Discussion |