Using Voting and Stacking Ensemble Techniques to Optimize Software Requirements Classification
Background: Ensemble models play an important role in integrating multiple classifiers in a wide range of applications, such as medical diagnosis, sentiment analysis, and financial market trends. In Requirements Engineering (RE), automatic requirements classification can be improved by the utilization of these models. Aims: This paper analyses the performance metrics of voting and stacking ensemble models for requirements classification prediction. Moreover, a cross-dataset validation was performed for the meta-models generated using the stacking ensemble method. Methods: Some previously trained base models and two datasets of software requirements written in Spanish (translated PROMISE exp and ReSpa dataset) were used to build the ensemble models. Results: The results indicate that the stacking model achieved a weighted F1-score of 0.828 using Support Vector Machine (SVM) and Multi-layer Perceptron (MLP) for translated PROMISE exp dataset. For the ReSpa dataset, the stacking model achieved a weighted F1-score of 0.890 using Logistic Regression (LR). Conclusion: This study confirms a slight improvement in the performance of binary requirements classification using stacking ensemble methods over voting and most individual base models. Moreover, combining all models outperforms combinations that include only Shallow ML or DL models.