ICSE 2022 (series) / MSR 2022 (series) / Data and Tool Showcase Track /
ManyTypes4TypeScript: A Comprehensive TypeScript Dataset for Sequence-Based Type Inference
Wed 18 May 2022 20:33 - 20:37 at MSR Main room - even hours - Session 6: Maintenance & Testing Chair(s): Ajay Jha, Amjed Tahir
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating machine-learning models for sequence-based type inference in TypeScript. The dataset includes over 9 million type annotations, across 13,953 projects and 539,571 files. The dataset is approximately 10x larger than analogous type inference datasets for Python, and is the largest available for TypeScript. We also provide API access to the dataset, which can be integrated into any tokenizer and used with any state-of-the-art sequence-based model. Finally, we provide analysis and performance results for state-of-the-art code-specific models, for baselining. ManyTypes4TypeScript is available on Huggingface, Zenodo, and CodeXGLUE.
Wed 18 MayDisplayed time zone: Eastern Time (US & Canada) change
Wed 18 May
Displayed time zone: Eastern Time (US & Canada) change
Information for Participants
Wed 18 May 2022 20:00 - 20:50 at MSR Main room - even hours - Session 6: Maintenance & Testing Chair(s): Ajay Jha, Amjed Tahir
Info for room MSR Main room - even hours: