DLInfer: Deep Learning with Static Slicing for Python Type Inference (ICSE 2023 - Technical Track)

Who

Yanyan Yan, Yang Feng, Hongcheng Fan, Baowen Xu

Track

ICSE 2023 Technical Track

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 19 May 2023 11:00 - 11:15 at Meeting Room 106 - Static analysis Chair(s): Marsha Chechik

Abstract

Python programming language has gained enormous popularity in the past decades. While its flexibility significantly improves software development productivity, the dynamic typing feature challenges software maintenance and quality assurance. To facilitate programming and type errors checking, the Python programming language has provided a type hint mechanism enabling developers to annotate type information for variables. However, this manual annotation process often requires plenty of resources and may introduce errors. In this paper, we propose a deep learning type inference technique, namely DLInfer, to automatically infer the type information for Python programs. DLInfer collects slice statements for variables through static analysis and then vectorizes them with the unigram language model algorithm. Based on the vectorized slicing features, We designed a bi-directional gated recurrent unit model to learn the type propagation information for inference. To validate the effectiveness of DLInfer, we conduct an extensive empirical study on 700 open-source projects. We evaluate its accuracy in inferring three kinds of fundamental types, including built-in, library, and user-defined types. By training with a large-scale dataset, DLInfer achieves an average of 98.79% Top-1 accuracy for the variables that can get type information through static analysis and manual annotation. Further, DLInfer achieves 83.03% type inference accuracy on average for the variables that can only obtain the type information through dynamic analysis. The results indicate DLInfer is highly effective in inferring types. It is promising to apply it to assist in various software engineering tasks for Python programs.

Yanyan Yan

Nanjing University

Yang Feng

Nanjing University

China

Hongcheng Fan

Nanjing University

Baowen Xu

Nanjing University

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 19 May
Displayed time zone: Hobart change

11:00 - 12:30	Static analysisTechnical Track / SEET - Software Engineering Education and Training / SEIP - Software Engineering in Practice at Meeting Room 106 Chair(s): Marsha Chechik University of Toronto

11:00 15m Talk		DLInfer: Deep Learning with Static Slicing for Python Type Inference Technical Track Yanyan Yan Nanjing University, Yang Feng Nanjing University, Hongcheng Fan Nanjing University, Baowen Xu Nanjing University
11:15 15m Talk		ViolationTracker: Building Precise Histories for Static Analysis Violations Technical Track Ping Yu Fudan University, China, Yijian Wu Fudan University, Xin Peng Fudan University, Jiahan Peng Fudan University, Jian Zhang Fudan University, Peicheng Xie Fudan University, Wenyun Zhao Fudan University, China Pre-print
11:30 15m Talk		On the use of static analysis to engage students with software quality improvement: An experience with PMD SEET - Software Engineering Education and Training Eman Abdullah AlOmar Stevens Institute of Technology, Salma Abdullah AlOmar NA, Mohamed Wiem Mkaouer Rochester Institute of Technology Pre-print
11:45 15m Talk		Long-term Static Analysis Rule Quality Monitoring Using True Negatives SEIP - Software Engineering in Practice Linghui Luo Amazon Web Services, Rajdeep Mukherjee Amazon Web Services, Omer Tripp Amazon, Martin Schäf Amazon Web Services, Qiang Zhou Amazon Web Services, Daniel J Sanchez Amazon Alexa
12:00 15m Talk		A Language-agnostic Framework for Mining Static Analysis Rules from Code Changes SEIP - Software Engineering in Practice Sedick David Baker Effendi Stellenbosch University, Berk Cirisci IRIF, University Paris Diderot and CNRS, France, Rajdeep Mukherjee Amazon Web Services, Hoan Anh Nguyen Amazon, Omer Tripp Amazon
12:15 7m Talk		GradeStyle: GitHub-Integrated and Automated Assessment of Java Code Style SEET - Software Engineering Education and Training Callum Iddon University of Auckland, Nasser Giacaman The University of Auckland, Valerio Terragni University of Auckland
12:22 7m Talk		The Challenges of Shift Left Static Analysis SEIP - Software Engineering in Practice Quoc-Sang Phan Facebook, Inc., KimHao Nguyen University of Nebraska-Lincoln, ThanhVu Nguyen George Mason University