Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial CodeVirtual
Partial code usually involves non-fully-qualified type names (non-FQNs) and undeclared receiving objects. Resolving the FQNs of these non-FQN types and undeclared receiving objects (referred to as type inference) is the prerequisite to effective search and reuse of partial code. Existing dictionary-lookup based methods build a symbolic knowledge base of API names and code contexts, which involve significant compilation overhead and are sensitive to unseen API names and code context variations. In this paper, we formulate type inference as a cloze-style fill-in-blank language task. Built on source code naturalness, our approach trains a code masked language model (MLM) as a neural knowledge base of code elements with a novel ``pre-train, prompt and predict'' paradigm from raw source code. Our approach is lightweight and has minimum requirements on code compilation. Unlike existing symbolic name and context matching for type inference, our prompt-tuned code MLM packs FQN syntax and usage in its parameters and supports fuzzy neural type inference. We systematically evaluate our approach on a large amount of source code from GitHub and Stack Overflow. Our results confirm the effectiveness of our approach design and the practicality for partial code type inference. As the first of its kind, our neural type inference method opens the door to many innovative ways of using partial code.
Wed 12 OctDisplayed time zone: Eastern Time (US & Canada) change
10:00 - 12:00 | Technical Session 11 - Analysis and TypesResearch Papers / NIER Track / Late Breaking Results at Gold A Chair(s): Thiago Ferreira University of Michigan - Flint | ||
10:00 20mResearch paper | SA4U: Practical Static Analysis for Unit Type Error Detection Research Papers Max Taylor The Ohio State University, Johnathon Aurand The Ohio State University, Feng Qin Ohio State University, USA, Xiaorui Wang The Ohio State University, Brandon Henry Tangram Flex, Xiangyu Zhang Purdue University | ||
10:20 10mVision and Emerging Results | Principled Composition of Function Variants for Dynamic Software Diversity and Program Protection NIER Track Giacomo Priamo Sapienza University of Rome, Daniele Cono D'Elia Sapienza University of Rome, Leonardo Querzoni Sapienza University Rome | ||
10:30 20mResearch paper | AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models Research Papers José Antonio Hernández López Department of Computer Science and Systems, University of Murcia, Martin Weyssow DIRO, Université de Montréal, Jesús Sánchez Cuadrado , Houari Sahraoui Université de Montréal Link to publication Pre-print | ||
10:50 10mPaper | Towards Gradual Multiparty Session TypingVirtual Late Breaking Results Sung-Shik Jongmans Open University of the Netherlands; CWI | ||
11:00 20mResearch paper | Static Type Recommendation for PythonVirtual Research Papers Ke Sun Peking University, Yifan Zhao Peking University, Dan Hao Peking University, Lu Zhang Peking University | ||
11:20 20mResearch paper | Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial CodeVirtual Research Papers Qing Huang School of Computer Information Engineering, Jiangxi Normal University, Zhiqiang Yuan School of Computer Information Engineering, Jiangxi Normal University, Zhenchang Xing Australian National University, Xiwei (Sherry) Xu CSIRO Data61, Liming Zhu CSIRO’s Data61; UNSW, Qinghua Lu CSIRO’s Data61 | ||
11:40 20mResearch paper | Jasmine: A Static Analysis Framework for Spring Core TechnologiesVirtual Research Papers Miao Chen Beijing University of Posts and Telecommunications, Tengfei Tu Beijing University of Posts and Telecommunications, Hua Zhang Beijing University of Posts and Telecommunications, Qiaoyan Wen Beijing University of Posts and Telecommunications, Weihang Wang University of Southern California |