ICPC 2018
Sun 27 - Mon 28 May 2018 Gothenburg, Sweden
co-located with * ICSE 2018 *

When code is compiled, information is lost, including some of the structure of the original source code as well as local identifier names. Existing decompilers can reconstruct much of the original source code, but typically use meaningless placeholder variables for identifier names. Using variable names which are more natural in the given context can make the code much easier to interpret, despite the fact that variable names have no effect on the execution of the program. In theory, it is impossible to recover the original identifier names since that information has been lost. However, most code is natural: it is highly repetitive and predictable based on the context. In this paper we propose a technique that assigns variables meaningful names by taking advantage of this naturalness property. We consider decompiler output to be a noisy distortion of the original source code, where the original source code is transformed into the decompiler output. Using this noisy channel model, we apply standard statistical machine translation approaches to choose natural identifiers, combining a translation model trained on a parallel corpus with a language model trained on unmodified C code. We generate a large parallel corpus from 1.2 TB of C source code obtained from GitHub. Under the most conservative assumptions, our technique is still able to recover the original variable names up to 16.2% of the time, which represents a lower bound for performance.

Conference Day
Sun 27 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

09:00 - 10:30
Opening, Vision Keynote, and Developer ObservationTechnical Research at J1 room
Chair(s): Chanchal K. RoyUniversity of Saskatchewan, Foutse KhomhPolytechnique Montréal, Katsuro InoueOsaka University
Day opening
Welcome to ICPC 2018
Technical Research
Foutse KhomhPolytechnique Montréal, Chanchal K. RoyUniversity of Saskatchewan
Sensing and Supporting Software Developer's Focus (Vision Keynote)Vision Keynote
Technical Research
Manuela ZuegerUniversity of Zurich, Thomas FritzUniversity of Zurich, University of British Columbia
Code Phonology: an exploration into the vocalization of codeERA
Technical Research
Felienne Hermans, Alaaeddin SwidanDelft University of Technology, Efthimia AivaloglouOpen University of the Netherlands
Meaningful Variable Names for Decompiled Code: A Machine Translation ApproachTechnical Research
Technical Research
Alan JaffeCarnegie Mellon University, Jeremy LacomisCarnegie Mellon University, Edward SchwartzCarnegie Mellon University, Claire Le GouesCarnegie Mellon University, Bogdan VasilescuCarnegie Mellon University
Pre-print Media Attached
Descriptive Compound Identifier Names Improve Source Code ComprehensionTechnical Research
Technical Research
Andrea SchankinKarlsruhe Institute of Technology, Annika BergerKarlsruhe Institute of Technology, Daniel HoltHeidelberg University, Johannes HofmeisterUniversity of Passau, Till RiedelKarlsruhe Institute of Technology, Michael BeiglKarlsruhe Institute of Technology