Write a Blog >>
MSR 2022
Mon 23 - Tue 24 May 2022
co-located with ICSE 2022

In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating machine-learning models for sequence-based type inference in TypeScript. The dataset includes over 9 million type annotations, across 13,953 projects and 539,571 files. The dataset is approximately 10x larger than analogous type inference datasets for Python, and is the largest available for TypeScript. We also provide API access to the dataset, which can be integrated into any tokenizer and used with any state-of-the-art sequence-based model. Finally, we provide analysis and performance results for state-of-the-art code-specific models, for baselining. ManyTypes4TypeScript is available on Huggingface, Zenodo, and CodeXGLUE.

Wed 18 May

Displayed time zone: Eastern Time (US & Canada) change

20:00 - 20:50
Session 6: Maintenance & TestingData and Tool Showcase Track / Technical Papers at MSR Main room - even hours
Chair(s): Ajay Jha University of Alberta, Amjed Tahir Massey University
20:00
4m
Short-paper
Characterizing High-Quality Test Methods: A First Empirical Study
Technical Papers
Pre-print
20:04
7m
Talk
CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning
Technical Papers
Mohammad Reza Taesiri University of Alberta, Finlay Macklon University of Alberta, Cor-Paul Bezemer University of Alberta
20:11
7m
Talk
An Empirical Study on Maintainable Method Size in Java
Technical Papers
Shaiful Chowdhury University of Alberta, Gias Uddin University of Calgary, Canada, Reid Holmes University of British Columbia
20:18
7m
Talk
Complex Python Features in the Wild
Technical Papers
Yi Yang Rensselaer Polytechnic Institute, Ana Milanova Rensselaer Polytechnic Institute, Martin Hirzel IBM Research
20:25
4m
Talk
Methods2Test: A dataset of focal methods mapped to test cases
Data and Tool Showcase Track
Michele Tufano Microsoft, Shao Kun Deng Microsoft Corporation, Neel Sundaresan Microsoft Corporation, Alexey Svyatkovskiy
20:29
4m
Talk
npm-filter: Automating the mining of dynamic information from npm packages
Data and Tool Showcase Track
Ellen Arteca Northeastern University, Alexi Turcotte Northeastern University
Pre-print Media Attached
20:33
4m
Talk
ManyTypes4TypeScript: A Comprehensive TypeScript Dataset for Sequence-Based Type Inference
Data and Tool Showcase Track
Kevin Jesse University of California, Davis, Prem Devanbu Department of Computer Science, University of California, Davis
DOI Pre-print
20:37
13m
Live Q&A
Discussions and Q&A
Technical Papers


Information for Participants
Wed 18 May 2022 20:00 - 20:50 at MSR Main room - even hours - Session 6: Maintenance & Testing Chair(s): Ajay Jha, Amjed Tahir
Info for room MSR Main room - even hours:

Click here to go to the room on Midspace