Towards Using Data-Influence Methods to Detect Noisy Samples in Source Code CorporaVirtual
Despite the recent trend of developing and applying neural source code models to software engineering tasks, the quality of such models is insufficient for real-world use. This is because there could be noise in the source code corpora used to train such models. We adapt data-influence methods to detect such noises in this paper. Data-influence methods are used in machine learning to evaluate the similarity of a target sample to the correct samples in order to determine whether or not the target sample is noisy. Our evaluation results show that data-influence methods can identify noisy samples from neural code models in classification-based tasks. We anticipate that this approach will contribute to the larger vision of developing better neural source code models from a data-centric perspective, which is a key driver for developing useful source code models in practice.
Wed 12 OctDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 18:00 | Technical Session 17 - SE for AIResearch Papers / Late Breaking Results / NIER Track / Tool Demonstrations at Banquet B Chair(s): Tim Menzies North Carolina State University | ||
16:00 10mVision and Emerging Results | On the Naturalness of Bytecode Instructions NIER Track | ||
16:10 20mResearch paper | A Light Bug Triage Framework for Applying Large Pre-trained Language Model Research Papers Jaehyung Lee Pohang University of Science and Technology, Pohang , Hwanjo Yu Pohang University of Science and Technology, Pohang, HanKisun Samsung Research | ||
16:30 10mVision and Emerging Results | Global Decision Making Over Deep Variability in Feedback-Driven Software Development NIER Track Jörg Kienzle McGill University, Canada, Benoit Combemale University of Rennes; Inria; IRISA, Gunter Mussbacher McGill University, Omar Alam Trent University, Francis Bordeleau École de Technologie Supérieure (ETS), Lola Burgueño University of Malaga, Gregor Engels Paderborn University, Jessie Galasso-Carbonnel Université de Montréal, Jean-Marc Jézéquel Univ Rennes - IRISA, Bettina Kemme McGill University, Canada, Sébastien Mosser McMaster University, Houari Sahraoui Université de Montréal, Maximilian Schiedermeier McGill University, Eugene Syriani Université de Montréal | ||
16:40 20mResearch paper | CARGO: AI-Guided Dependency Analysis for Migrating Monolithic Applications to Microservices ArchitectureACM SIGSOFT Distinguished Paper Award Research Papers Vikram Nitin Columbia University, Shubhi Asthana IBM Research, Baishakhi Ray Columbia University, Rahul Krishna IBM Research Pre-print | ||
17:00 10mDemonstration | Answering Software Deployment Questions via Neural Machine Reading at ScaleVirtual Tool Demonstrations Guan Jie Qiu School of Software, Shanghai Jiao Tong University, Diwei Chen School of Software, Shanghai Jiao Tong University, Shuai Zhang School of Software, Shanghai Jiao Tong University, Yitian Chai School of Software, Shanghai Jiao Tong University, Xiaodong Gu Shanghai Jiao Tong University, China, Beijun Shen School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University | ||
17:10 20mResearch paper | PRCBERT: Prompt Learning for Requirement Classification using BERT-based Pretrained Language ModelsVirtual Research Papers Xianchang Luo University of Science and Technology of China, Yinxing Xue University of Science and Technology of China, Zhenchang Xing Australian National University, Jiamou Sun Australian National University | ||
17:30 10mVision and Emerging Results | Test-Driven Multi-Task Learning with Functionally Equivalent Code Transformation for Neural Code GenerationVirtual NIER Track Xin Wang Wuhan University, Xiao Liu School of Information Technology, Deakin University, Pingyi Zhou Noah’s Ark Lab, Huawei Technologies, Qixia Liu China Mobile Communications Corporation, Jin Liu Wuhan University, Hao Wu Yunnan University, Xiaohui Cui Wuhan University | ||
17:40 10mPaper | Towards Using Data-Influence Methods to Detect Noisy Samples in Source Code CorporaVirtual Late Breaking Results Anh T. V. Dau FPT Software AI Center, Nghi D. Q. Bui Singapore Management University, Thang Nguyen-Duc FPT Software AI Center, Hoang Thanh-Tung Vietnam National University |