ICPC 2023
Mon 15 - Tue 16 May 2023 Melbourne, Australia
co-located with ICSE 2023
Compiled binary executables are often the only available artifact in reverse engineering, malware analysis, or maintenance of software systems. Unfortunately, the lack of semantic information like variable names makes comprehending binaries difficult. In efforts to improve the comprehensibility of binaries, researchers have recently used machine learning techniques to predict semantic information contained in the original source code. Chen et al. implemented DIRTY, a Transformer-based Encoder-Decoder architecture capable of augmenting decompiled code with variable names and types by leveraging decompiler output tokens and variable size information. Chen et al. were able to demonstrate a substantial increase in name and type extraction accuracy on Hex-Rays decomiler outputs compared to existing static analysis and AI-based techniques. We extend the original DIRTY results by re-training the DIRTY model on a dataset produced by the open-source Ghidra decompiler. Although Chen et al. concluded that Ghidra was not a suitable decompiler candidate due to its difficulty in parsing DWARF, we demonstrate that straightforward parsing of variable data generated by Ghidra results in similar retyping performance. We hope this work inspires further interest and adoption of the Ghidra decompiler for use in research projects.

Tue 16 May

Displayed time zone: Hobart change

13:45 - 15:15
Programming Languages, Types, and ComplexityDiscussion / Research / Replications and Negative Results (RENE) / Journal First at Meeting Room 106
Chair(s): Vittoria Nardone
13:45
9m
Full-paper
How Well Static Type Checkers Work with Gradual Typing? A Case Study on Python
Research
Wenjie Xu Nanjing University, Lin Chen Nanjing University, Chenghao Su Nanjing University, Yimeng Guo Nanjing University, Yanhui Li Nanjing University, Yuming Zhou Nanjing University, Baowen Xu Nanjing University
13:54
9m
Full-paper
Too Simple? Notions of Task Complexity used in Maintenance-based Studies of Programming Tools
Research
Patrick Rein University of Potsdam; Hasso Plattner Institute, Tom Beckmann Hasso Plattner Institute, Eva Krebs Hasso Plattner Institute (HPI), University of Potsdam, Germany, Toni Mattis University of Potsdam; Hasso Plattner Institute, Robert Hirschfeld University of Potsdam; Hasso Plattner Institute
14:03
9m
Full-paper
Path Complexity Predicts Code Comprehension Effort
Research
Sofiane Dissem Harvey Mudd College, Eli Pregerson Harvey Mudd College, Adi Bhargava Harvey Mudd College, Josh Cordova Harvey Mudd College, Lucas Bang Harvey Mudd College
14:12
5m
Short-paper
Revisiting Deep Learning for Variable Type Recovery
Replications and Negative Results (RENE)
Kevin Cao Vanderbilt University, Kevin Leach Vanderbilt University
Pre-print
14:17
9m
Talk
Programming language implementations for context-oriented self-adaptive systems
Journal First
Nicolás Cardozo Universidad de los Andes, Kim Mens Université catholique de Louvain, ICTEAM institute, Belgium
Link to publication DOI Media Attached
14:26
9m
Full-paper
Improving Code Search with Multi-Modal Momentum Contrastive Learning
Research
Zejian Shi Fudan University, Yun Xiong Fudan University, Yao Zhang Fudan University, Zhijie Jiang National University of Defense Technology, Jinjing Zhao National Key Laboratory of Science and Technology on Information System Security, Lei Wang National University of Defense Technology, Shanshan Li National University of Defense Technology
Pre-print
14:35
9m
Full-paper
Revisiting Lightweight Compiler Provenance Recovery on ARM Binaries
Replications and Negative Results (RENE)
Jason Kim Georgia Tech, Daniel Genkin Georgia Tech, Kevin Leach Vanderbilt University
Pre-print
14:44
31m
Panel
Discussion 7
Discussion