Assessing diversity in creating seed set for snowballing search for systematic literature review in software engineering
Background: An effective search strategy is crucial for a systematic literature review (SLR). Traditionally, database searches have been used, and snowballing searches complemented them. More recently, snowballing has gained recognition as a primary search strategy for SLR in software engineering (SE). However, a challenge when applying snowballing is creating a seed set considering diverse characteristics, such as different authors, years, and publishers. Objective: In this paper, we compare and evaluate the snowballing performance varying the seed set creation using different diversity characteristics. Method: We replicated the snowballing search procedure in two SLRs to compare and evaluate the diversity in seed set creation. Both SLRs used a more "traditional'' approach to create the seed set for snowballing for searching relevant studies for an SLR. In contrast, our replications created the seed set using diversity’s characteristics. Results: Our replication achieved an overall precision of 0.019, a relative recall of 0.97, and an F-measure of 0.0372, improving the original SLR values of 0.006, 0.921, and 0.0119 for precision, recall, and F-measure, respectively. Conclusions: Our replication findings suggest that a diverse seed set reduces the snowballing efforts while mitigating the risk of overlooking pertinent studies.