DocMine: A Software Documentation-Related Dataset of 950 GitHub Repositories
Software documentation is one of the critical aspects of a software project, that could support multiple tasks throughout the software development life-cycle. There is extensive research on understanding issues and challenges with existing documentation, which is typically available as readme files. In projects that support collaborative development, such as those on GitHub, other software artifacts such as commits, pull requests and issues, apart from the conventional readme files, wikis and source code comments, also contain useful information, that supports in understanding, using, extending and maintaining the project. However, we are not aware of any dataset that explicitly focuses on documentation-related information in multiple software artifacts such as readme files, commits and pull requests across a repository. To address this concern and to facilitate further research in software documentation, we present DocMine, as a dataset of documentation-related information, extracted from around 1.35M software artifacts in 950 GitHub repositories, spanning across four different programming languages. The dataset along with its documentation is made available in CSV and .sql formats at - https://doi.org/10.5281/zenodo.5195084.
Tue 16 MayDisplayed time zone: Hobart change
11:00 - 11:45 | Documentation + Q&A IITechnical Papers / Data and Tool Showcase Track at Meeting Room 109 Chair(s): Maram Assi Queen's University | ||
11:00 12mTalk | Understanding the Role of Images on Stack Overflow Technical Papers Dong Wang Kyushu University, Japan, Tao Xiao Nara Institute of Science and Technology, Christoph Treude University of Melbourne, Raula Gaikovina Kula Nara Institute of Science and Technology, Hideaki Hata Shinshu University, Yasutaka Kamei Kyushu University Pre-print | ||
11:12 12mTalk | Do Subjectivity and Objectivity Always Agree? A Case Study with Stack Overflow Questions Technical Papers Saikat Mondal University of Saskatchewan, Masud Rahman Dalhousie University, Chanchal K. Roy University of Saskatchewan Pre-print | ||
11:24 6mTalk | GiveMeLabeledIssues: An Open Source Issue Recommendation System Data and Tool Showcase Track Joseph Vargovich Northern Arizona University, Fabio Marcos De Abreu Santos Northern Arizona University, USA, Jacob Penney Northern Arizona University, Marco Gerosa Northern Arizona University, Igor Steinmacher Northern Arizona University Pre-print Media Attached | ||
11:30 6mTalk | DocMine: A Software Documentation-Related Dataset of 950 GitHub Repositories Data and Tool Showcase Track | ||
11:36 6mTalk | PENTACET data - 23 Million Code Comments and 500,000 SATD comments Data and Tool Showcase Track Murali Sridharan University of Oulu, Leevi Rantala University of Oulu, Mika Mäntylä University of Oulu |