Write a Blog >>
MSR 2022
Mon 23 - Tue 24 May 2022
co-located with ICSE 2022

Unit testing is an essential part of the software development process, which helps to identify issues with source code in early stages of development and prevent regressions. Machine learning has emerged as viable approach to help software developers generate automated unit tests. However, generating reliable unit test cases that are semantically correct and capable of catching software bugs or unintended behavior via machine learning requires large, metadata-rich, datasets. In this paper we present Methods2Test: a large, supervised dataset of test cases mapped to corresponding methods under test (i.e., focal methods). This dataset contains 780,944 pairs of JUnit tests and focal methods, extracted from a total of 91,385 Java open source projects hosted on GitHub with licenses permitting re-distribution. The main challenge behind the creation of the Methods2Test was to establish a reliable mapping between a test case and the relevant focal method. To this aim, we designed a set of heuristics, based on developers’ best practices in software testing, which identify the likely focal method for a given test case. To facilitate further analysis, we store a rich set of metadata for each method-test pair in JSON-formatted files. Additionally, we extract textual corpus from the dataset at different context levels, which we provide both in raw and tokenized forms, in order to enable researchers to train and evaluate machine learning models for Automated Test Generation. Methods2Test is publicly available at: https://github.com/microsoft/methods2test

Wed 18 May

Displayed time zone: Eastern Time (US & Canada) change

20:00 - 20:50
Session 6: Maintenance & TestingData and Tool Showcase Track / Technical Papers at MSR Main room - even hours
Chair(s): Ajay Jha University of Alberta, Amjed Tahir Massey University
20:00
4m
Short-paper
Characterizing High-Quality Test Methods: A First Empirical Study
Technical Papers
Pre-print
20:04
7m
Talk
CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning
Technical Papers
Mohammad Reza Taesiri University of Alberta, Finlay Macklon University of Alberta, Cor-Paul Bezemer University of Alberta
20:11
7m
Talk
An Empirical Study on Maintainable Method Size in Java
Technical Papers
Shaiful Chowdhury University of Alberta, Gias Uddin University of Calgary, Canada, Reid Holmes University of British Columbia
20:18
7m
Talk
Complex Python Features in the Wild
Technical Papers
Yi Yang Rensselaer Polytechnic Institute, Ana Milanova Rensselaer Polytechnic Institute, Martin Hirzel IBM Research
20:25
4m
Talk
Methods2Test: A dataset of focal methods mapped to test cases
Data and Tool Showcase Track
Michele Tufano Microsoft, Shao Kun Deng Microsoft Corporation, Neel Sundaresan Microsoft Corporation, Alexey Svyatkovskiy
20:29
4m
Talk
npm-filter: Automating the mining of dynamic information from npm packages
Data and Tool Showcase Track
Ellen Arteca Northeastern University, Alexi Turcotte Northeastern University
Pre-print Media Attached
20:33
4m
Talk
ManyTypes4TypeScript: A Comprehensive TypeScript Dataset for Sequence-Based Type Inference
Data and Tool Showcase Track
Kevin Jesse University of California, Davis, Prem Devanbu Department of Computer Science, University of California, Davis
DOI Pre-print
20:37
13m
Live Q&A
Discussions and Q&A
Technical Papers


Information for Participants
Wed 18 May 2022 20:00 - 20:50 at MSR Main room - even hours - Session 6: Maintenance & Testing Chair(s): Ajay Jha, Amjed Tahir
Info for room MSR Main room - even hours:

Click here to go to the room on Midspace