Write a Blog >>
ICSE 2022
Sun 8 - Fri 27 May 2022
Tue 10 May 2022 05:20 - 05:25 at ICSE room 1-odd hours - Machine Learning with and for SE 1 Chair(s): Gemma Catolino
Thu 12 May 2022 20:15 - 20:20 at ICSE room 6 - Humans and Machines Chair(s): Sandeep Kuttal

The performance of neural code search is significantly influenced by the quality of the training data from which the neural models are derived. A large corpus of high-quality query and code pairs is demanded to establish a precise mapping from the natural language to the programming language. Due to the limited availability, most widely-used code search datasets are established with compromise, such as using code comments as a replacement of queries. Our empirical study on a famous code search dataset reveals that over one-third of its queries contain noises that make them deviate from natural user queries. Models trained through noisy data are faced with severe performance degradation when applied in real-world scenarios. To improve the dataset quality and make the queries of its samples semantically identical to real user queries is critical for the practical usability of neural code search. In this paper, we propose a data cleaning framework consisting of two subsequent filters: a rule-based syntactic filter and a model-based semantic filter. This is the first framework that applies semantic query cleaning to code search datasets. Experimentally, we evaluated the effectiveness of our framework on two widely-used code search models and three manually-annotated code retrieval benchmarks. Training the popular DeepCS model with the filtered dataset from our framework improves its performance by 19.2% MRR and 21.3% Answer@1, on average with the three validation benchmarks.

Tue 10 May

Displayed time zone: Eastern Time (US & Canada) change

05:00 - 06:00
Machine Learning with and for SE 1NIER - New Ideas and Emerging Results / Technical Track / Journal-First Papers at ICSE room 1-odd hours
Chair(s): Gemma Catolino Tilburg University & ​Jheronimus Academy of Data Science
05:00
5m
Talk
SQAPlanner: Generating Data-Informed Software Quality Improvement Plans -- A Journal-First Presentation
Journal-First Papers
Dilini Rajapaksha Monash University, Kla Tantithamthavorn Monash University, Jirayus Jiarpakdee Monash University, Australia, Christoph Bergmeir Monash University, John Grundy Monash University, Wray Buntine Monash University
Link to publication Pre-print Media Attached
05:05
5m
Talk
Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks
Journal-First Papers
NIKITA MEHROTRA Indraprastha Institute of Information Technology, NAVDHA AGARWAL Indraprastha Institute of Information Technology, Delhi, PIYUSH GUPTA Indraprastha Institute of Information Technology, Delhi, SAKET ANAND Indraprastha Institute of Information Technology, Delhi, David Lo Singapore Management University, Rahul Purandare IIIT-Delhi
Link to publication DOI Media Attached
05:10
5m
Talk
Improving the Learnability of Machine Learning APIs by Semi-Automated API Wrapping
NIER - New Ideas and Emerging Results
Lars Reimann University of Bonn, Günter Kniesel-Wünsche University of Bonn
DOI Pre-print Media Attached
05:15
5m
Talk
Learning to Recommend Method Names with Global Context
Technical Track
Fang Liu Peking University, Ge Li Peking University, Zhiyi Fu Peking University, Shuai Lu Peking University, Yiyang Hao Silicon Heart Tech Co., Zhi Jin Peking University
Pre-print Media Attached
05:20
5m
Talk
On the Importance of Building High-quality Training Datasets for Neural Code SearchNominated for Distinguished Paper
Technical Track
Zhensu Sun The Hong Kong Polytechnic University, Li Li Monash University, Yan Liu Tongji University, Xiaoning Du Monash University, Australia, Li Li Monash University
Pre-print Media Attached
05:25
5m
Talk
CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences
Technical Track
Maliheh Izadi Delft University of Technology, Roberta Gismondi Delft University of Technology, Georgios Gousios Endor Labs & Delft University of Technology
DOI Pre-print

Thu 12 May

Displayed time zone: Eastern Time (US & Canada) change

20:00 - 21:00
Humans and MachinesTechnical Track / Journal-First Papers at ICSE room 6
Chair(s): Sandeep Kuttal The University of Tulsa
20:00
5m
Talk
SQAPlanner: Generating Data-Informed Software Quality Improvement Plans -- A Journal-First Presentation
Journal-First Papers
Dilini Rajapaksha Monash University, Kla Tantithamthavorn Monash University, Jirayus Jiarpakdee Monash University, Australia, Christoph Bergmeir Monash University, John Grundy Monash University, Wray Buntine Monash University
Link to publication Pre-print Media Attached
20:05
5m
Talk
Interacto: A Modern User Interaction Processing Model
Journal-First Papers
Arnaud Blouin Univ Rennes, Jean-Marc Jézéquel Univ Rennes - IRISA
Link to publication DOI Pre-print Media Attached
20:10
5m
Talk
A Comparison of Natural Language Understanding Platforms for Chatbots in Software Engineering
Journal-First Papers
Ahmad Abdellatif Concordia University, Khaled Badran Concordia University, Diego Costa Concordia University, Canada, Emad Shihab Concordia University
Pre-print Media Attached
20:15
5m
Talk
On the Importance of Building High-quality Training Datasets for Neural Code SearchNominated for Distinguished Paper
Technical Track
Zhensu Sun The Hong Kong Polytechnic University, Li Li Monash University, Yan Liu Tongji University, Xiaoning Du Monash University, Australia, Li Li Monash University
Pre-print Media Attached
20:20
5m
Talk
Hashing It Out: A Survey of Programmers’ Cannabis Usage, Perception, and Motivation
Technical Track
Madeline Endres University of Michigan, Kevin Boehnke University of Michigan, Westley Weimer University of Michigan
DOI Pre-print Media Attached

Information for Participants
Tue 10 May 2022 05:00 - 06:00 at ICSE room 1-odd hours - Machine Learning with and for SE 1 Chair(s): Gemma Catolino
Info for room ICSE room 1-odd hours:

Click here to go to the room on Midspace

Thu 12 May 2022 20:00 - 21:00 at ICSE room 6 - Humans and Machines Chair(s): Sandeep Kuttal
Info for room ICSE room 6:

Click here to go to the room on Midspace