Traceability between requirements and code changes is critical in software projects, yet links between issue reports and source code are often missing. This paper investigates the use of a machine learning approach to discovering undocumented issue-commit links and explores factors of its effectiveness. Building on earlier work, we extend our evaluation to eleven GitHub repositories and validate the recovered links through developer interviews, link categorization, and analysis of distribution and correlation. Results indicate that the semantic clarity, rather than the length of textual descriptions, significantly affects model prediction accuracy. Fixed confidence thresholds are insufficient, particularly as project size and complexity increase.
Yi Peng University of Gothenburg and Chalmers University of Technology, Hans-Martin Heyn University of Gothenburg & Chalmers University of Technology, Jennifer Horkoff Chalmers and the University of Gothenburg
Anne Hess Technical University of Applied Sciences Würzburg-Schweinfurt, Gerald Heller Consultant and Trainer, Hartmut Schmitt HK Business Solutions GmbH, Cornelia Seraphin msg systems AG, Ismaning, Oliver Karras TIB - Leibniz Information Centre for Science and Technology