Tue 16 Jul 2024 12:00 - 12:15 at Acerola - Morning session 2

Open Source Software (OSS) hosting platforms like GitHub also contain many non-software projects that should be excluded from the dataset for most software engineering research studies. However, due to the lack of obvious indicators, researchers have to spend considerable manual effort to find suitable projects or rely on convenience sampling or heuristics for selecting projects for their research. Moreover, the diverse nature of OSS projects often poses further challenges in selecting projects aligned with study objectives, especially when the study intends to identify projects based on semantic information like intended use, which is not easy to discern solely based on the project characteristics that are available through the search APIs like GitHub's.

Our goals are to establish a robust method of identifying software projects from the population of repositories hosted in social coding platforms and to categorize the software projects based on who the target users are and how those projects are meant to be used.

Using data from 35,621 projects in the World of Code dataset, we employed a combination of machine learning techniques, including Doc2Vec and Random Forest, to identify the software projects and to categorize them as standalone applications, libraries, or plug-ins.

Furthermore, our findings highlight the risks of selecting projects solely based on filtering by commonly used project criteria like the number of contributors, commits, or stars as even after using similar filtering, 16.6% of projects were found to be non-software projects.

Our research should aid software engineering researchers in project selection, benefiting both industry and academia. We also envision our work inspiring further research in this domain.

Tue 16 Jul

Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30
Morning session 2PROMISE 2024 at Acerola
11:00
60m
Talk
The Ever-Evolving Promises of Data in Software Ecosystems: Models, AI, and Analytics (Keynote)
PROMISE 2024
Raula Gaikovina Kula Nara Institute of Science and Technology
DOI
12:00
15m
Talk
Smarter Project Selection for Software Engineering Research
PROMISE 2024
Tapajit Dey Carnegie Mellon University Software Engineering Institute, Jonathan Loungani Carnegie Mellon University, James Ivers Carnegie Mellon University
DOI
12:15
15m
Talk
Evaluating the Quality of Open Source Ansible Playbooks: An Executability Perspective
PROMISE 2024
Pemsith Mendis Auburn University, Wilson Reaves Auburn University, Muhammad Ali Babar School of Computer Science, The University of Adelaide, Yue Zhang Auburn University, Akond Rahman Auburn University
DOI