In light of the growing interest in type inference research for Python, both researchers and practitioners require a standardized process to assess the performance of various type inference techniques. This paper introduces TypeEvalPy, a comprehensive micro-benchmarking framework for evaluating type inference tools. TypeEvalPy contains 154 code snippets with 845 type annotations across 18 categories that target various Python features. The framework manages the execution of containerized tools, transforms inferred types into a standardized format, and produces meaningful metrics for assessment. Through our analysis, we compare the performance of six type inference tools, highlighting their strengths and limitations. Our findings provide a foundation for further research and optimization in the domain of Python type inference.
Huijia Sun ShanghaiTech University, China, Chris Poskitt Singapore Management University, Yang Sun Singapore Management University, Jun Sun Singapore Management University, Yuqi Chen ShanghaiTech University, China
Jaekwon Lee University of Ottawa & University of Luxembourg, Seung Yeob Shin University of Luxembourg, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland, Shiva Nejati University of Ottawa