Generative Type Inference for Python (ASE 2023 - Research Papers)

Who

Yun Peng, Chaozheng Wang, Wenxuan Wang, Cuiyun Gao, Michael Lyu

Track

ASE 2023 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 13 Sep 2023 16:08 - 16:21 at Plenary Room 2 - Code Generation 2 Chair(s): Marianne Huchard

Abstract

Python is a popular dynamic programming language, evidenced by its ranking as the second most commonly used language on GitHub. However, its dynamic type system can lead to potential type errors, leading researchers to explore automatic type inference approaches for Python programs. Existing type inference approaches can be generally grouped into three categories, i.e., rule-based, supervised, and cloze-style approaches. The rule-based type inference approaches can ensure the accuracy of predicted variable types, but they suffer from low coverage problems caused by dynamic features and external calls. Supervised type inference approaches, while feature-agnostic and able to mitigate the low coverage problem, require large, high-quality annotated datasets and are limited to pre-defined types. As zero-shot approaches, the cloze-style approaches reformulate the type inference problem into a fill-in-the-blank problem by leveraging the general knowledge in powerful pre-trained code models. However, their performance is limited since they ignore the domain knowledge from static typing rules which reflect the inference logic. What is more, their predictions are not interpretable, hindering developers’ understanding and verification of the results.

This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis. TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs), enabling language models to learn from how static analysis infers types. By combining COT prompts with code slices and type hints, TypeGen constructs example prompts from human annotations. TypeGen only requires very few annotated examples to teach language models to generate similar COT prompts via in-context learning. Moreover, TypeGen enhances the interpretability of results through the use of the input-explanation-output strategy, which generates both explanations and type predictions in COT prompts. Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match by using only five examples. Furthermore, TypeGen achieves substantial improvements of 27% to 84% compared to the zero-shot performance of large language models with parameter sizes ranging from 1.3B to 175B in terms of top-1 Exact Match.

Link to Preprint

https://arxiv.org/abs/2307.09163

File attachments

Slices (slices.zip)	2.33MiB

Yun Peng

Chinese University of Hong Kong

China

Chaozheng Wang

The Chinese University of Hong Kong

Wenxuan Wang

Chinese University of Hong Kong

China

Cuiyun Gao

Harbin Institute of Technology

China

Michael Lyu

The Chinese University of Hong Kong

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 13 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

15:30 - 17:00	Code Generation 2Research Papers / NIER Track / Tool Demonstrations at Plenary Room 2 Chair(s): Marianne Huchard LIRMM

15:30 12m Talk		COMEX: A Tool for Generating Customized Source Code Representations Tool Demonstrations Debeshee Das Indian Institute of Technology Tirupati, Noble Saji Mathews University of Waterloo, Canada, Alex Mathai , Srikanth Tamilselvam IBM Research, Kranthi Sedamaki Indian Institute of Technology Tirupati, Sridhar Chimalakonda IIT Tirupati, Atul Kumar IBM India Research Labs Pre-print Media Attached File Attached
15:42 12m Talk		Fast and Reliable Program Synthesis via User Interaction Research Papers Yanju Chen University of California at Santa Barbara, Chenglong Wang Microsoft Research, Xinyu Wang University of Michigan, Osbert Bastani University of Pennsylvania, Yu Feng University of California at Santa Barbara File Attached
15:55 12m Talk		From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven AI Chaining Research Papers Xiaoxue Ren Zhejiang University, Xinyuan Ye Australian National University, Dehai Zhao CSIRO's Data61, Zhenchang Xing , Xiaohu Yang Zhejiang University File Attached
16:08 12m Talk		Generative Type Inference for Python Research Papers Yun Peng Chinese University of Hong Kong, Chaozheng Wang The Chinese University of Hong Kong, Wenxuan Wang Chinese University of Hong Kong, Cuiyun Gao Harbin Institute of Technology, Michael Lyu The Chinese University of Hong Kong Pre-print File Attached
16:21 12m Talk		Compiler Auto-tuning via Critical Flag Selection Research Papers Mingxuan Zhu Peking University, Dan Hao Peking University
16:34 12m Talk		Enhancing Code Safety in Quantum Intermediate Representation NIER Track Junjie Luo Kyushu University, Jianjun Zhao Kyushu University File Attached
16:47 12m Talk		CAT-LM: Training Language Models on Aligned Code And Tests Research Papers Nikitha Rao Carnegie Mellon University, Kush Jain Carnegie Mellon University, Uri Alon Carnegie Mellon University, Claire Le Goues Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University Media Attached File Attached