Write a Blog >>
ICSE 2022
Sun 8 - Fri 27 May 2022

Code completion is an essential feature of IDEs, yet current autocompleters are restricted to either grammar-based or NLP-based single token completions. Both approaches have significant drawbacks: grammar-based autocompletion is very restricted in dynamically-typed language scenarios, whereas NLP-based autocompletion struggles to understand the semantics of the programming language, giving suggestions that ignore a developer’s context.

In this work, we present CodeFill, a language model for autocompletion that combines structure and naming information. Using a parallel Transformer architecture and Multi-Task learning, CodeFill consumes sequences of source code token names and their equivalent AST token types. Uniquely, CodeFill is trained both for single-token and multi-token (statement) prediction, which enables it to learn long-range dependencies among grammatical and naming elements. We train CodeFill on two datasets, consisting of 29M and 425M lines of code respectively. To make the evaluation more realistic, we develop a method to automatically infer points in the source code at which completion matters. We compare CodeFill against four baselines and two state-of-the-art models, GPT-C and TravTrans+. CodeFill surpasses all baselines in single token prediction (MRR: 70.9% vs. 66.2% and 67.8%) and significantly outperforms the state of the art for multi-token prediction (ROUGE-L: 63.7% vs. 52.4% and 59.2%, for n=4 tokens). We publicly release our source code and data for replication and use.

Tue 10 May

Displayed time zone: Eastern Time (US & Canada) change

05:00 - 06:00
Machine Learning with and for SE 1NIER - New Ideas and Emerging Results / Technical Track / Journal-First Papers at ICSE room 1-odd hours
Chair(s): Gemma Catolino Tilburg University & ​Jheronimus Academy of Data Science
05:00
5m
Talk
SQAPlanner: Generating Data-Informed Software Quality Improvement Plans -- A Journal-First Presentation
Journal-First Papers
Dilini Rajapaksha Monash University, Chakkrit Tantithamthavorn Monash University, Jirayus Jiarpakdee Monash University, Australia, Christoph Bergmeir Monash University, John Grundy Monash University, Wray Buntine Monash University
Link to publication Pre-print Media Attached
05:05
5m
Talk
Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks
Journal-First Papers
NIKITA MEHROTRA Indraprastha Institute of Information Technology, NAVDHA AGARWAL Indraprastha Institute of Information Technology, Delhi, PIYUSH GUPTA Indraprastha Institute of Information Technology, Delhi, SAKET ANAND Indraprastha Institute of Information Technology, Delhi, David Lo Singapore Management University, Rahul Purandare IIIT-Delhi
Link to publication DOI Media Attached
05:10
5m
Talk
Improving the Learnability of Machine Learning APIs by Semi-Automated API Wrapping
NIER - New Ideas and Emerging Results
Lars Reimann University of Bonn, Günter Kniesel-Wünsche University of Bonn
DOI Pre-print Media Attached
05:15
5m
Talk
Learning to Recommend Method Names with Global Context
Technical Track
Fang Liu Peking University, Ge Li Peking University, Zhiyi Fu Peking University, Shuai Lu Peking University, Yiyang Hao Silicon Heart Tech Co., Zhi Jin Peking University
Pre-print Media Attached
05:20
5m
Talk
On the Importance of Building High-quality Training Datasets for Neural Code SearchNominated for Distinguished Paper
Technical Track
Zhensu Sun Monash University, Li Li Monash University, Yan Liu Tongji University, Xiaoning Du Monash University, Australia, Li Li Monash University
Pre-print Media Attached
05:25
5m
Talk
CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences
Technical Track
Maliheh Izadi Delft University of Technology, Roberta Gismondi Delft University of Technology, Georgios Gousios Endor Labs & Delft University of Technology
DOI Pre-print

Wed 11 May

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:00
Search-Based Software Engineering 3Technical Track / NIER - New Ideas and Emerging Results at ICSE room 3-odd hours
Chair(s): Mohamed Wiem Mkaouer Rochester Institute of Technology
11:00
5m
Talk
A Black Box Technique to Reduce Energy Consumption of Android Apps
NIER - New Ideas and Emerging Results
Abdul Ali Bangash University of Alberta, Canada, Karim Ali University of Alberta, Abram Hindle University of Alberta
Pre-print Media Attached
11:05
5m
Talk
CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences
Technical Track
Maliheh Izadi Delft University of Technology, Roberta Gismondi Delft University of Technology, Georgios Gousios Endor Labs & Delft University of Technology
DOI Pre-print
11:10
5m
Talk
Fairness-aware Configuration of Machine Learning Libraries
Technical Track
Saeid Tizpaz-Niari University of Texas at El Paso, Ashish Kumar , Gang Tan Pennsylvania State University, Ashutosh Trivedi University of Colorado Boulder
DOI Pre-print Media Attached
11:15
5m
Talk
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and Many-Objective OptimizationDistinguished Paper Award
Technical Track
Fitash Ul Haq University of Luxembourg, Donghwan Shin University of Luxembourg, Lionel Briand University of Luxembourg; University of Ottawa
Pre-print Media Attached
11:20
5m
Talk
PropR: Property-Based Automatic Program Repair
Technical Track
Matthías Páll Gissurarson Chalmers University of Technology, Sweden, Leonhard Applis Delft University of Technology, Annibale Panichella Delft University of Technology, Arie van Deursen Delft University of Technology, Netherlands, Dave Sands Chalmers
DOI Pre-print Media Attached

Information for Participants
Tue 10 May 2022 05:00 - 06:00 at ICSE room 1-odd hours - Machine Learning with and for SE 1 Chair(s): Gemma Catolino
Info for room ICSE room 1-odd hours:

Click here to go to the room on Midspace

Wed 11 May 2022 11:00 - 12:00 at ICSE room 3-odd hours - Search-Based Software Engineering 3 Chair(s): Mohamed Wiem Mkaouer
Info for room ICSE room 3-odd hours:

Click here to go to the room on Midspace