Rethinking Technological Investment and Cost-Benefit: A Software Requirements Dependency Extraction Case Study
Machine Learning (ML) is widely used for different purposes within Software Engineering. It can substantially improve the efficiency and effectiveness of organizations. While various methods and techniques exist, all of them have strengths and weaknesses under varying scenarios and contexts. Thus far, the selection and implementation of ML techniques rely almost exclusively on accuracy criteria. This narrow perspective ignores crucial considerations of anticipated costs of the ML activities versus the projected benefits gained from applying the results. Thus, in this study we introduce a return-on-investment (ROI) perspective to evaluate ML techniques in Software Engineering, offering a novel lens to assess their true value beyond traditional benchmarks. We present findings for an approach that addresses this gap by enhancing the accuracy criterion with return on investment (ROI) considerations. Specifically, we extract dependencies from textual descriptions of software requirements and analyze the performance of two state-of-the-art ML techniques: Random Forest and Bidirectional Encoder Representations from Transformers (BERT), a encoder only Large Language Model. Drawing upon two publicly available data sets, we compare decision-making based on (i) exclusively on accuracy and (ii) on ROI analysis to provide decision support for the selection and usage of ML classification methods. As such, our results showed that, a) chasing model accuracy improvisation through increased annotated data does not generate expected returns in traditional ML methods. b) For complex ML algorithms, the need for larger annotated dataset investment cost is justified by the higher returns, however, the trade-offs between accuracy and ROI become evident.