Code Prediction by Feeding Trees to Transformers (ICSE 2021 - Technical Track) - ICSE 2021

Write a Blog >>

Mon 17 May - Sat 5 June 2021

Who

Seohyun Kim, Jinman Zhao, Yuchi Tian, Satish Chandra

Track

ICSE 2021 Technical Track

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Wed 26 May 2021 19:10 - 19:30 at Blended Sessions Room 3 - 2.5.3. Code Completion Chair(s): Marsha Chechik
Thu 27 May 2021 07:10 - 07:30 at Blended Sessions Room 3 - 2.5.3. Code Completion

Abstract

Code prediction, more specifically autocomplete, has become an essential feature in modern IDEs. Autocomplete is more effective when the desired next token is at (or close to) the top of the list of potential completions offered by the IDE at cursor position. This is where the strength of the underlying machine learning system that produces a ranked order of potential completions comes into play.

We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. Our work uses Transformers as the base neural architecture. We show that by making the Transformer architecture aware of the syntactic structure of code, we increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of several state-of-the-art next token prediction systems by margins ranging from 14% to 18%.

We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a company internal Python corpus. Our code and data preparation pipeline will be available in open source.

Link to Preprint

https://arxiv.org/abs/2003.13848

Seohyun Kim

Facebook

United States

Jinman Zhao

University of Wisconsin-Madison, USA

Yuchi Tian

Columbia University

Satish Chandra

Facebook, USA

United States

YT Video

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Wed 26 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

	18:50 - 19:50	2.5.3. Code CompletionSEIP - Software Engineering in Practice / Technical Track at Blended Sessions Room 3 +12h Chair(s): Marsha Chechik University of Toronto

	18:50 20m Paper		Siri, Write the Next MethodTechnical Track Technical Track Fengcai Wen Software Institute, USI Università della Svizzera italiana, Emad Aghajani Software Institute, USI Università della Svizzera italiana, Csaba Nagy Software Institute, USI Università della Svizzera italiana, Michele Lanza Software Institute, USI Università della Svizzera italiana, Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print Media Attached
	19:10 20m Paper		Code Prediction by Feeding Trees to TransformersTechnical Track Technical Track Seohyun Kim Facebook, Jinman Zhao University of Wisconsin-Madison, USA, Yuchi Tian Columbia University, Satish Chandra Facebook, USA Pre-print Media Attached
	19:30 20m Paper		Learning Autocompletion from Real-World DatasetsSEIP SEIP - Software Engineering in Practice Gareth Aye Facebook, Inc., Seohyun Kim Facebook, Hongyu Li Facebook, Inc. Pre-print Media Attached

Thu 27 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

	06:50 - 07:50	2.5.3. Code CompletionTechnical Track / SEIP - Software Engineering in Practice at Blended Sessions Room 3

	06:50 20m Paper		Siri, Write the Next MethodTechnical Track Technical Track Fengcai Wen Software Institute, USI Università della Svizzera italiana, Emad Aghajani Software Institute, USI Università della Svizzera italiana, Csaba Nagy Software Institute, USI Università della Svizzera italiana, Michele Lanza Software Institute, USI Università della Svizzera italiana, Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print Media Attached
	07:10 20m Paper		Code Prediction by Feeding Trees to TransformersTechnical Track Technical Track Seohyun Kim Facebook, Jinman Zhao University of Wisconsin-Madison, USA, Yuchi Tian Columbia University, Satish Chandra Facebook, USA Pre-print Media Attached
	07:30 20m Paper		Learning Autocompletion from Real-World DatasetsSEIP SEIP - Software Engineering in Practice Gareth Aye Facebook, Inc., Seohyun Kim Facebook, Hongyu Li Facebook, Inc. Pre-print Media Attached

:

:

:

: