A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets
The adoption of chatbots into software development tasks has become increasingly popular among practitioners, driven by the advantages of cost reduction and acceleration of the software development process. Chatbots understand users’ queries through the Natural Language Understanding component (NLU). To yield reasonable performance, NLUs have to be trained with extensive, high-quality datasets, that express a multitude of ways users may interact with chatbots. However, previous studies show that creating a high-quality training dataset for software engineering chatbots is expensive in terms of both resources and time.
Therefore, in this paper, we present an automated transformer-based approach to augment software engineering chatbot datasets. Our approach combines traditional natural language processing techniques with the BART transformer to augment a dataset by generating queries through synonym replacement and paraphrasing. We evaluate the impact of using the augmentation approach on the NLU’s performance using three software engineering datasets. Overall, the augmentation approach shows promising results in improving the NLU’s performance, augmenting queries with varying sentence structures while preserving their original semantics. Furthermore, it increases the NLU’s confidence in its intent classification for the correctly classified intents. We believe that our study helps practitioners to improve the performance of their chatbots and guides future research to propose augmentation techniques for SE chatbots.
Thu 24 OctDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
16:00 - 17:30 | Machine learning for software engineeringESEM Technical Papers / ESEM Emerging Results, Vision and Reflection Papers Track / ESEM Journal-First Papers at Telensenyament (B3 Building - 1st Floor) Chair(s): Luigi Quaranta University of Bari, Italy | ||
16:00 20mFull-paper | A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets ESEM Technical Papers Ahmad Abdellatif University of Calgary, Khaled Badran Concordia University, Canada, Diego Costa Concordia University, Canada, Emad Shihab Concordia University | ||
16:20 20mFull-paper | Unsupervised and Supervised Co-learning for Comment-based Codebase Refining and its Application in Code Search ESEM Technical Papers Gang Hu School of Information Science & Engineering, Yunnan University, Xiaoqin Zeng School of Information Science & Engineering, Yunnan University, Wanlong Yu , Min Peng , YUAN Mengting School of Computer Science, Wuhan University, Wuhan, China, Liang Duan | ||
16:40 20mFull-paper | Good things come in three: Generating SO Post Titles with Pre-Trained Models, Self Improvement and Post Ranking ESEM Technical Papers Duc Anh Le Hanoi University of Science and Technology, Anh M. T. Bui Hanoi University of Science and Technology, Phuong T. Nguyen University of L’Aquila, Davide Di Ruscio University of L'Aquila Pre-print | ||
17:00 15mVision and Emerging Results | PromptLink: Multi-template prompt learning with adversarial training for issue-commit link recovery ESEM Emerging Results, Vision and Reflection Papers Track Yang Deng The School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China, Bangchao Wang Wuhan Textile University, Zhiyuan Zou The School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China, Luyao Ye The School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China | ||
17:15 15mJournal Early-Feedback | GPTSniffer: A CodeBERT-based classifier to detect source code written by ChatGPT ESEM Journal-First Papers Phuong T. Nguyen University of L’Aquila, Juri Di Rocco University of L'Aquila, Claudio Di Sipio University of l'Aquila, Riccardo Rubei University of L'Aquila, Davide Di Ruscio University of L'Aquila, Massimiliano Di Penta University of Sannio, Italy Link to publication DOI Pre-print |