DACOS-A Manually Annotated Dataset of Code Smells
Researchers apply machine-learning techniques for code smell detection to counter the subjectivity of many code smells. Such approaches need a large, manually annotated dataset for training and benchmarking. Existing literature offers a few datasets; however, they are small in size and, more importantly, do not focus on the subjective code snippets. In this paper, we present DACOS, a manually annotated dataset containing 10,267 annotations for 5,192 code snippets. The dataset targets three kinds of code smells at different granularity-multifaceted abstraction, complex method, and long parameter list. The dataset is created in two phases. The first phase helps us identify the code snippets that are potentially subjective by determining the thresholds of metrics used to detect a smell. The second phase collects annotations for potentially subjective snippets. We also offer an extended dataset DACOSX that includes definitely benign and definitely smelly snippets by using the thresholds identified in the first phase. We have developed Tagman, a web application to help annotators view and mark the snippets one-by-one and record the provided annotations. We make the datasets and the web application accessible publicly. This dataset will help researchers working on smell detection techniques to build relevant and context-aware machine-learning models.
DACOS Presentation (DACOS.pdf) | 5.37MiB |
Tue 16 MayDisplayed time zone: Hobart change
11:00 - 11:45 | Code SmellsTechnical Papers / Industry Track / Data and Tool Showcase Track at Meeting Room 110 Chair(s): Md Tajmilur Rahman Gannon University | ||
11:00 12mTalk | Don't Forget the Exception! Considering Robustness Changes to Identify Design Problems Technical Papers Anderson Oliveira PUC-Rio, João Lucas Correia Federal University of Alagoas, Leonardo Da Silva Sousa Carnegie Mellon University, USA, Wesley Assunção Johannes Kepler University Linz, Austria & Pontifical Catholic University of Rio de Janeiro, Brazil, Daniel Coutinho PUC-Rio, Alessandro Garcia PUC-Rio, Willian Oizumi GoTo, Caio Barbosa UFAL, Anderson Uchôa Federal University of Ceará, Juliana Alves Pereira PUC-Rio Pre-print | ||
11:12 12mTalk | Pre-trained Model Based Feature Envy Detection Technical Papers mawenhao Wuhan University, Yaoxiang Yu Wuhan University, Xiaoming Ruan Wuhan University, Bo Cai Wuhan University | ||
11:24 6mTalk | CLEAN++: Code Smells Extraction for C++ Data and Tool Showcase Track Tom Mashiach Ben Gurion University of the Negev, Israel, Bruno Sotto-Mayor Ben Gurion University of the Negev, Israel, Gal Kaminka Bar Ilan University, Israel, Meir Kalech Ben Gurion University of the Negev, Israel | ||
11:30 6mTalk | DACOS-A Manually Annotated Dataset of Code Smells Data and Tool Showcase Track Himesh Nandani Dalhousie University, Mootez Saad Dalhousie University, Tushar Sharma Dalhousie University Pre-print File Attached | ||
11:36 6mTalk | What Warnings Do Engineers Really Fix? The Compiler That Cried Wolf Industry Track Gunnar Kudrjavets University of Groningen, Aditya Kumar Snap, Inc., Ayushi Rastogi University of Groningen, The Netherlands Pre-print |