The Limits of the Identifiable: Challenges in Python Version Identification with Deep Learning
The evolution of Python requires accurate version identification to facilitate compatibility and ongoing support. We extend previous work on deep learning models for Python version identification, where LSTM and CodeBERT achieved a 92% accuracy on short code snippets. We further expand these results to larger realistic files, utilising code segmentation techniques for varying input granularities. These techniques ranged from per-line analysis to larger code segments. Our findings show that while LSTM with CodeBERT embeddings maintained high accuracy on short snippets, performance significantly drops on longer segments, particularly in balancing information retention and misclassification risks. Notably, import-statement analysis, despite being the most intuitive indicator of version requirements, reached only a 30% accuracy. This exposes the limitations of our approach when encountering rare or user-defined modules. The findings expose the limitations of deep learning for language version identification, and suggest that alternative approaches may be necessary for high accuracy on larger datasets.
Wed 13 MarDisplayed time zone: Athens change
14:00 - 15:30 | API and Dependency AnalysisResearch Papers / Reproducibility Studies and Negative Results (RENE) Track at LAPPI Chair(s): Martin Monperrus KTH Royal Institute of Technology | ||
14:00 15mTalk | The Limits of the Identifiable: Challenges in Python Version Identification with Deep Learning Reproducibility Studies and Negative Results (RENE) Track Marcus Gerhold University of Twente, The Netherlands, Lola Solovyeva University of Twente, Vadim Zaytsev University of Twente, Netherlands Pre-print | ||
14:15 15mTalk | Exploring Dependencies Among Inconsistencies to Enhance the Consistency Maintenance of Models Research Papers Luciano Marchezan Johannes Kepler Universität Linz, Wesley Assunção North Carolina State University, Edvin Herac , Saad Shafiq University of Southern California, Alexander Egyed Johannes Kepler University Linz | ||
14:30 15mTalk | BUMP: A Benchmark of Reproducible Breaking Dependency Updates Research Papers Frank Reyes Garcia KTH Royal Institute of Technology, Yogya Gamage KTH Royal Institute of Technology, Gabriel Skoglund KTH Royal Institute of Technology, Benoit Baudry KTH, Martin Monperrus KTH Royal Institute of Technology | ||
14:45 15mTalk | APIGen: Generative API Method Recommendation Research Papers Yujia Chen Harbin Institute of Technology, Shenzhen, Cuiyun Gao Harbin Institute of Technology, Muyijie Zhu Harbin Institute of Technology, Shenzhen, Qing Liao Harbin Institute of Technology, Yong Wang Anhui Polytechnic University, Guoai Xu Harbin Institute of Technology, Shenzhen | ||
15:00 15mTalk | A Multi-Metric Ranking with Label Correlations Approach for Library Migration Recommendations Research Papers Jiancheng Zhang SouthWest Petroleum University, Peng Wu Sichuan Tourism University, Qin Luo Southwest Petroleum University | ||
15:15 15mTalk | Adaptoring: Adapter Generation to Provide an Alternative API for a Library Research Papers Pre-print |