Machine Learning models from other fields, like Computational Linguistics, have been transplanted to Software Engineering tasks, often quite successfully. Yet a transplanted model’s initial success at a given task does not necessarily mean it is well-suited for the task. In this work, we examine a common example of this phenomenon: the conceit that ``software patching is like language translation''. We demonstrate empirically that there are subtle, but critical distinctions between sequence-to-sequence models and translation model: while program repair benefits greatly from the former, general modeling architecture, it actually suffers from design decisions built into the latter, both in terms of translation accuracy and diversity. Given these findings, we demonstrate how a more principled approach to model design, based on our empirical findings and general knowledge of software development, can lead to better solutions. We propose several models that leverage the same machine learning tools, but whose architecture, data presentation, and metrics are specialized for the software engineering task. The resulting models perform significantly better than the studied baseline, especially in more program repair appropriate metrics. Overall, our results demonstrate the merit of studying the intricacies of machine learned models in software engineering: not only can this help elucidate potential issues that may be overshadowed by increases in accuracy; it can also help innovate on these models to raise the state-of-the-art further. We will publicly release our replication data and materials at \url{https://github.com/ARiSE-Lab/Patch-as-translation}.
Tue 22 SepDisplayed time zone: (UTC) Coordinated Universal Time change
16:00 - 17:00 | |||
16:00 20mTalk | Synthesis of Infinite-State Systems with Random Behavior Research Papers Andreas Katis University of Minnesota, Grigory Fedyukovich Florida State University, Jeffrey Chen University of Minnesota, David Greve Collins Aerospace, Sanjai Rayadurgam University of Minnesota, Michael Whalen University of Minnesota | ||
16:20 20mTalk | Demystifying Loops in Smart Contracts Research Papers Benjamin Mariano University of Texas at Austin, Yanju Chen University of California, Santa Barbara, Yu Feng University of California, Santa Barbara, Shuvendu K. Lahiri Microsoft Research, Işıl Dillig University of Texas at Austin, USA | ||
16:40 20mTalk | Patching as Translation: The Data and the Metaphor Research Papers Yangruibo Ding Columbia University, Baishakhi Ray Columbia University, USA, Prem Devanbu University of California, Vincent J. Hellendoorn Carnegie Mellon University DOI Pre-print |