Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code (ASE 2022 - Research Papers)

Who

Qing Huang, Zhiqiang Yuan, Zhenchang Xing, Xiwei (Sherry) Xu, Liming Zhu, Qinghua Lu

Track

ASE 2022 Research Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 12 Oct 2022 11:20 - 11:40 at Gold A - Technical Session 11 - Analysis and Types Chair(s): Thiago Ferreira

Abstract

Partial code usually involves non-fully-qualified type names (non-FQNs) and undeclared receiving objects. Resolving the FQNs of these non-FQN types and undeclared receiving objects (referred to as type inference) is the prerequisite to effective search and reuse of partial code. Existing dictionary-lookup based methods build a symbolic knowledge base of API names and code contexts, which involve significant compilation overhead and are sensitive to unseen API names and code context variations. In this paper, we formulate type inference as a cloze-style fill-in-blank language task. Built on source code naturalness, our approach trains a code masked language model (MLM) as a neural knowledge base of code elements with a novel ``pre-train, prompt and predict'' paradigm from raw source code. Our approach is lightweight and has minimum requirements on code compilation. Unlike existing symbolic name and context matching for type inference, our prompt-tuned code MLM packs FQN syntax and usage in its parameters and supports fuzzy neural type inference. We systematically evaluate our approach on a large amount of source code from GitHub and Stack Overflow. Our results confirm the effectiveness of our approach design and the practicality for partial code type inference. As the first of its kind, our neural type inference method opens the door to many innovative ways of using partial code.

Qing Huang

School of Computer Information Engineering, Jiangxi Normal University

China

Zhiqiang Yuan

School of Computer Information Engineering, Jiangxi Normal University

Zhenchang Xing

Australian National University

Australia

Xiwei (Sherry) Xu

CSIRO Data61

Australia

Liming Zhu

CSIRO’s Data61; UNSW

Australia

Qinghua Lu

CSIRO’s Data61

Australia

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 12 Oct
Displayed time zone: Eastern Time (US & Canada) change

10:00 - 12:00	Technical Session 11 - Analysis and TypesResearch Papers / NIER Track / Late Breaking Results at Gold A Chair(s): Thiago Ferreira University of Michigan - Flint

10:00 20m Research paper		SA4U: Practical Static Analysis for Unit Type Error Detection Research Papers Max Taylor The Ohio State University, Johnathon Aurand The Ohio State University, Feng Qin Ohio State University, USA, Xiaorui Wang The Ohio State University, Brandon Henry Tangram Flex, Xiangyu Zhang Purdue University
10:20 10m Vision and Emerging Results		Principled Composition of Function Variants for Dynamic Software Diversity and Program Protection NIER Track Giacomo Priamo Sapienza University of Rome, Daniele Cono D'Elia Sapienza University of Rome, Leonardo Querzoni Sapienza University Rome
10:30 20m Research paper		AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models Research Papers José Antonio Hernández López Department of Computer Science and Systems, University of Murcia, Martin Weyssow DIRO, Université de Montréal, Jesús Sánchez Cuadrado , Houari Sahraoui Université de Montréal Link to publication Pre-print
10:50 10m Paper		Towards Gradual Multiparty Session TypingVirtual Late Breaking Results Sung-Shik Jongmans Open University of the Netherlands; CWI
11:00 20m Research paper		Static Type Recommendation for PythonVirtual Research Papers Ke Sun Peking University, Yifan Zhao Peking University, Dan Hao Peking University, Lu Zhang Peking University
11:20 20m Research paper		Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial CodeVirtual Research Papers Qing Huang School of Computer Information Engineering, Jiangxi Normal University, Zhiqiang Yuan School of Computer Information Engineering, Jiangxi Normal University, Zhenchang Xing Australian National University, Xiwei (Sherry) Xu CSIRO Data61, Liming Zhu CSIRO’s Data61; UNSW, Qinghua Lu CSIRO’s Data61
11:40 20m Research paper		Jasmine: A Static Analysis Framework for Spring Core TechnologiesVirtual Research Papers Miao Chen Beijing University of Posts and Telecommunications, Tengfei Tu Beijing University of Posts and Telecommunications, Hua Zhang Beijing University of Posts and Telecommunications, Qiaoyan Wen Beijing University of Posts and Telecommunications, Weihang Wang University of Southern California