Learning to Find Usages of Library Functions in Optimized Binaries
Wed 11 May 2022 13:00 - 13:05 at ICSE room 4-odd hours - Synthesis and Reverse Engineering Chair(s): Reed Milewicz
Wed 25 May 2022 11:30 - 11:35 at Room 304+305 - Papers 7: Evolution and Maintenance Chair(s): Thomas LaToza
Wed 25 May 2022 13:30 - 15:00 at Ballroom Gallery - Posters 1
Much software, whether beneficent or malevolent, is distributed only as binaries, sans source code. Absent source code, understanding binaries’ behavior can be quite challenging, especially when compiled under higher levels of compiler optimization. These optimizations can transform comprehensible, “natural” source constructions into something entirely unrecognizable. Reverse engineering binaries, especially those suspected of being malevolent or guilty of intellectual property theft, are important and time-consuming tasks. There is a great deal of interest in tools to “decompile” binaries back into more natural source code to aid reverse engineering. Decompilation involves several desirable steps, including recreating source-language constructions, variable names, and perhaps even comments. One central step in creating binaries is optimizing function calls, using steps such as inlining. Recovering these (possibly inlined) function calls from optimized binaries is an essential task that most state-of-the-art decompiler tools try to do but do not perform very well. In this paper, we evaluate a supervised learning approach to the problem of recovering optimized function calls. We leverage open-source software and develop an automated labeling scheme to generate a reasonably large dataset of binaries labeled with actual function usages. We augment this large but limited labeled dataset with a pre-training step, which learns the decompiled code statistics from a much larger unlabeled dataset. Thus augmented, our learned labeling model can be combined with an existing decompilation tool, Ghidra, to achieve substantially improved performance in function call recovery, especially at higher levels of optimization.
Mon 9 MayDisplayed time zone: Eastern Time (US & Canada) change
21:00 - 22:00 | Program Analysis 3Technical Track / SEIP - Software Engineering in Practice / Journal-First Papers at ICSE room 5-odd hours Chair(s): Travis Breaux Carnegie Mellon University | ||
21:00 5mTalk | Learning to Find Usages of Library Functions in Optimized Binaries Journal-First Papers Toufique Ahmed University of California at Davis, Prem Devanbu Department of Computer Science, University of California, Davis, Anand Ashok Sawant University of California, Davis Link to publication DOI Pre-print Media Attached | ||
21:05 5mTalk | InspectJS: Leveraging Code Similarity and User-Feedback for Effective Taint Specification Inference for JavaScript SEIP - Software Engineering in Practice Saikat Dutta University of Illinois at Urbana-Champaign, Diego Garbervetsky University of Buenos Aires and CONICET, Argentina, Shuvendu K. Lahiri Microsoft Research, Max Schaefer GitHub, Inc. DOI Pre-print Media Attached | ||
21:10 5mTalk | Static Inference Meets Deep Learning: A Hybrid Type Inference Approach for PythonNominated for Distinguished Paper Technical Track Yun Peng The Chinese University of Hong Kong, Cuiyun Gao Harbin Institute of Technology, Zongjie Li The Hong Kong University of Science and Technology, Bowei Gao Harbin Institute of Technology, Shenzhen, David Lo Singapore Management University, Qirun Zhang Georgia Institute of Technology, USA, Michael Lyu The Chinese University of Hong Kong DOI Pre-print Media Attached | ||
21:15 5mTalk | DeepDiagnosis: Automatically Diagnosing Faults and Recommending Actionable Fixes in Deep Learning Programs Technical Track Mohammad Wardat Dept. of Computer Science, Iowa State University, Breno Dantas Cruz Dept. of Computer Science, Iowa State University, Wei Le Iowa State University, Hridesh Rajan Iowa State University Pre-print Media Attached | ||
21:20 5mTalk | Striking a Balance: Pruning False-Positives from Static Call GraphsNominated for Distinguished Paper Technical Track Akshay Utture University of California, Los Angeles (UCLA), Shuyang Liu University of California, Los Angeles, Christian Gram Kalhauge Technical University of Denmark, Jens Palsberg University of California at Los Angeles DOI Pre-print Media Attached |
Wed 11 MayDisplayed time zone: Eastern Time (US & Canada) change
Wed 25 MayDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | Papers 7: Evolution and MaintenanceJournal-First Papers / Technical Track / SEIP - Software Engineering in Practice at Room 304+305 Chair(s): Thomas LaToza George Mason University | ||
11:00 5mTalk | A Software Impact Analysis Tool based on Change History Learning and its Evaluation SEIP - Software Engineering in Practice Haruya Iwasaki Shibaura Institute of Technologies, Tsuyoshi Nakajima Shibaura Institute of Technology, Ryota Tsukamoto Mitsubishi Electric Corporation, Kazuko Takahashi Mitsubishi Electric Corporation, Shuichi Tokumoto Mitsubishi Electric Corporation DOI Media Attached | ||
11:05 5mTalk | Using Pre-Trained Models to Boost Code Review Automation Technical Track Rosalia Tufano Università della Svizzera Italiana, Simone Masiero Software Institute @ Università della Svizzera Italiana, Antonio Mastropaolo Università della Svizzera italiana, Luca Pascarella Università della Svizzera italiana (USI), Denys Poshyvanyk William and Mary, Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print Media Attached | ||
11:10 5mTalk | Self-Admitted Technical Debt Practices: A Comparison Between Industry and Open-Source Journal-First Papers Fiorella Zampetti University of Sannio, Italy, Gianmarco Fucci University of Sannio, Alexander Serebrenik Eindhoven University of Technology, Massimiliano Di Penta University of Sannio, Italy Link to publication DOI Pre-print Media Attached | ||
11:15 5mTalk | Journal First Submission of the Article: What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk Journal-First Papers Pooja Rani University of bern, Sebastiano Panichella Zurich University of Applied Sciences, Manuel Leuenberger Software Composition Group, University of Bern, Switzerland, Mohammad Ghafari School of Computer Science, University of Auckland, Oscar Nierstrasz University of Bern, Switzerland Link to publication DOI Authorizer link Media Attached | ||
11:20 5mTalk | An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags Journal-First Papers Christian D. Newman Rochester Institute of Technology, Michael J. Decker Bowling Green State University, Reem S. Alsuhaibani Kent State University, Anthony Peruma Rochester Institute of Technology, Mohamed Wiem Mkaouer Rochester Institute of Technology, Satyajit Mohapatra Rochester Institute of Technology, Tejal Vishnoi Rochester Institute of Technology, Marcos Zampieri Rochester Institute of Technology, Timothy Sheldon BNY Mellon, Emily Hill Drew University Link to publication DOI Pre-print Media Attached | ||
11:25 5mTalk | Retrieving Data Constraint Implementations Using Fine-Grained Code Patterns Technical Track Juan Manuel Florez The University of Texas at Dallas, Jonathan Perry The University of Texas at Dallas, Shiyi Wei University of Texas at Dallas, Andrian Marcus University of Texas at Dallas Pre-print Media Attached | ||
11:30 5mTalk | Learning to Find Usages of Library Functions in Optimized Binaries Journal-First Papers Toufique Ahmed University of California at Davis, Prem Devanbu Department of Computer Science, University of California, Davis, Anand Ashok Sawant University of California, Davis Link to publication DOI Pre-print Media Attached | ||
11:35 5mTalk | Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies Technical Track Pre-print Media Attached |