DLInfer: Deep Learning with Static Slicing for Python Type Inference
This is the research artifact of the paper titled ‘DLInfer: Deep learning with static slicing for Python Type Inference.’ DLInfer is a deep learning type inference with static slicing, for Python variables. DLInfer combines logical information into context. That is static slicing, which is to obtain contextual information related to the variable based on data flow analysis. DLInfer collects slice statements for variables through static analysis and then vectorizes them with the Unigram Language Model algorithm. Based on the vectorized slicing features, We designed a bi-directional gated recurrent unit model to learn the type propagation information for inference. To validate the effectiveness of DLInfer, we conduct an extensive empirical study on 700 open-source projects. We evaluate its accuracy in inferring three fundamental types: built-in, library, and user-defined types. By training with a large-scale dataset, DLInfer achieves an average of 98.79% Top-1 accuracy for the variables that can get type information through static analysis and manual annotation. Further, DLInfer achieves 83.03% type inference accuracy on average for the variables that can only obtain the type information through dynamic analysis. The results indicate that DLInfer is highly effective in inferring types. It is promising to apply it to assist in various software engineering tasks for Python programs.