Write a Blog >>
MSR 2022
Mon 23 - Tue 24 May 2022
co-located with ICSE 2022

Machine learning on source code (MLOnCode) is a popular research field that has been driven by the availability of large-scale code repositories and the development of powerful probabilistic and deep learning models for mining source code. Code-to-code recommendation is a task in MLOnCode that aims to recommend relevant, diverse and concise code snippets that usefully extend the code currently being written by a developer in their development environment (IDE). Code-to-code recommendation engines hold the promise of increasing developer productivity by reducing context switching from the IDE and increasing code-reuse. Existing code-to-code recommendation engines do not scale gracefully to large codebases, exhibiting a linear growth in query time as the code repository increases in size. In addition, existing code-to-code recommendation engines fail to account for the global statistics of code repositories in the ranking function, such as the distribution of code snippet lengths, leading to sub-optimal retrieval results. We address both of these weaknesses with \emph{Senatus}, a new code-to-code recommendation engine. At the core of Senatus is \emph{De-Skew} LSH a new locality sensitive hashing (LSH) algorithm that indexes the data for fast (sub-linear time) retrieval while also counteracting the skewness in the snippet length distribution using novel abstract syntax tree-based feature scoring and selection algorithms. We evaluate Senatus and find the recommendations to be of higher quality than competing baselines, while achieving faster search. For example on the CodeSearchNet dataset Senatus improves performance by 31.21% F1 and 147.9\emph{x} faster query time compared to Facebook Aroma. Senatus also outperforms standard MinHash LSH by 29.2% F1 and 51.02\emph{x} faster query time.

Thu 19 May

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 11:50
Session 11: Machine Learning & Information RetrievalTechnical Papers at MSR Main room - odd hours
Chair(s): Phuong T. Nguyen University of L’Aquila
11:00
4m
Short-paper
On the Naturalness of Fuzzer Generated Code
Technical Papers
Rajeswari Hita Kambhamettu Carnegie Mellon University, John Billos Wake Forest University, Carolyn "Tomi" Oluwaseun-Apo Pennsylvania State University, Benjamin Gafford Carnegie Mellon University, Rohan Padhye Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University
11:04
7m
Talk
Does Configuration Encoding Matter in Learning Software Performance? An Empirical Study on Encoding Schemes
Technical Papers
Jingzhi Gong Loughborough University, Tao Chen Loughborough University
DOI Pre-print Media Attached
11:11
7m
Talk
Multimodal Recommendation of Messenger Channels
Technical Papers
Ekaterina Koshchenko JetBrains Research, Egor Klimov JetBrains Research, Vladimir Kovalenko JetBrains Research
11:18
7m
Talk
Senatus: A Fast and Accurate Code-to-Code Recommendation Engine
Technical Papers
Fran Silavong JP Morgan Chase & Co., Sean Moran JP Morgan Chase & Co., Antonios Georgiadis JP Morgan Chase & Co., Rohan Saphal JP Morgan Chase & Co., Robert Otter JP Morgan Chase & Co.
DOI Pre-print Media Attached
11:25
7m
Talk
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical Study
Technical Papers
Tatiana Castro Vélez City University of New York (CUNY) Graduate Center, Raffi Khatchadourian City University of New York (CUNY) Hunter College, Mehdi Bagherzadeh Oakland University, Anita Raja City University of New York (CUNY) Hunter College
Pre-print Media Attached
11:32
7m
Talk
GraphCode2Vec: Generic Code Embedding via Lexical and Program Dependence Analyses
Technical Papers
Wei Ma SnT, University of Luxembourg, Mengjie Zhao LMU Munich, Ezekiel Soremekun SnT, University of Luxembourg, Qiang Hu University of Luxembourg, Jie M. Zhang King's College London, Mike Papadakis University of Luxembourg, Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Xiaofei Xie Singapore Management University, Singapore, Yves Le Traon University of Luxembourg, Luxembourg
Pre-print
11:39
11m
Live Q&A
Discussions and Q&A
Technical Papers

Mon 23 May

Displayed time zone: Eastern Time (US & Canada) change

13:30 - 15:00
Blended Technical Session 2 (Machine Learning and Information Retrieval) Technical Papers / Data and Tool Showcase Track at Room 315+316
Chair(s): Preetha Chatterjee Drexel University, USA
13:30
15m
Talk
Methods for Stabilizing Models across Large Samples of Projects(with case studies on Predicting Defect and Project Health)
Technical Papers
Suvodeep Majumder North Carolina State University, Tianpei Xia North Carolina State University, Rahul Krishna North Carolina State University, Tim Menzies North Carolina State University
Pre-print Media Attached
13:45
15m
Talk
GraphCode2Vec: Generic Code Embedding via Lexical and Program Dependence Analyses
Technical Papers
Wei Ma SnT, University of Luxembourg, Mengjie Zhao LMU Munich, Ezekiel Soremekun SnT, University of Luxembourg, Qiang Hu University of Luxembourg, Jie M. Zhang King's College London, Mike Papadakis University of Luxembourg, Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Xiaofei Xie Singapore Management University, Singapore, Yves Le Traon University of Luxembourg, Luxembourg
Pre-print
14:00
15m
Talk
Senatus: A Fast and Accurate Code-to-Code Recommendation Engine
Technical Papers
Fran Silavong JP Morgan Chase & Co., Sean Moran JP Morgan Chase & Co., Antonios Georgiadis JP Morgan Chase & Co., Rohan Saphal JP Morgan Chase & Co., Robert Otter JP Morgan Chase & Co.
DOI Pre-print Media Attached
14:15
8m
Short-paper
Comments on Comments: Where Code Review and Documentation Meet
Technical Papers
Nikitha Rao Carnegie Mellon University, Jason Tsay IBM Research, Martin Hirzel IBM Research, Vincent J. Hellendoorn Carnegie Mellon University
DOI Pre-print File Attached
14:23
8m
Short-paper
On the Naturalness of Fuzzer Generated Code
Technical Papers
Rajeswari Hita Kambhamettu Carnegie Mellon University, John Billos Wake Forest University, Carolyn "Tomi" Oluwaseun-Apo Pennsylvania State University, Benjamin Gafford Carnegie Mellon University, Rohan Padhye Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University
14:31
8m
Talk
SOSum: A Dataset of Stack Overflow Post Summaries
Data and Tool Showcase Track
Bonan Kou Purdue University, Yifeng Di Purdue University, Muhao Chen University of Southern California, Tianyi Zhang Purdue University
14:39
21m
Live Q&A
Discussions and Q&A
Technical Papers


Information for Participants
Thu 19 May 2022 11:00 - 11:50 at MSR Main room - odd hours - Session 11: Machine Learning & Information Retrieval Chair(s): Phuong T. Nguyen
Info for room MSR Main room - odd hours:

Click here to go to the room on Midspace