The pearls, challenges, and pitfalls of analyzing data in software engineering empirical studies
We characterized the data analysis process in the empirical software engineering domain to outline the evolution of data analysis in software engineering and identify which processes and techniques are adopted, their limitation, and which validation approaches are used.
We conducted a large investigation combining some aspects of the systematic literature review process and applying the large language model (LLM) to extract the data from the retrieved papers. In detail, we searched for papers that conducted empirical studies, including all the types described in Wöhlin guidelines [1] and all the synonyms in the search keyword. Due to download time constraints, we limited the search to the Software Engineering domain and considered studies published between 1994 and 2023.
We crawled the data from the eight source engines recommended by Kitchenham and Charters [2], collecting more than 15k unique papers.
We extracted the goal, research questions, and data analysis process from each paper, which included all the variable types considered in the study, all the data processing steps, the analysis models and techniques adopted, and the results validation approaches.
In the meantime, according to the different analysis scenarios, we created a map showing which approach should be followed based on the statistical guidelines [3]. We then compared this map with what the authors apply in their works and could observe the issues (pearls, challenges, and pitfalls) of analyzing data in software engineering empirical studies. ISERN is the leading community in empirical software engineering. Its findings and recommendations have positively influenced the evolution of empirical studies in distinct areas besides software engineering. However, what is the ISERN community’s perception regarding the issues of analyzing data in software engineering empirical studies? Is that possible to organize a roadmap of actions based on the triangulation and combination of previous findings with ISERN´s perceptions to support empirical researchers?
This working session intends to raise concerns on the issues related to data analysis in software engineering empirical studies and produce a manifesto to highlight the main aspects of the problem that demand action from our community.
The session is proposed as a round table workshop based on presenting cases and focus groups. The results obtained from the discussions will be triangulated with the previous findings to support the organization of a roadmap for analyzing data in Software Engineering empirical studies, which will be reported to the ISERN community in the next annual meeting.
[1] C. Wöhlin et al. Experimentation in Software Engineering. Springer 2012
[2] B. Kitchenham and S. Charters. Guidelines for performing Systematic Literature Reviews in Software Engineering. 2007
Tue 22 OctDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
14:00 - 15:30 | |||
14:00 90mOther | The pearls, challenges, and pitfalls of analyzing data in software engineering empirical studies ISERN C: Guilherme Horta Travassos Federal University of Rio de Janeiro, C: Valentina Lenarduzzi University of Oulu |