MSR 2023
Sun 14 - Sat 20 May 2023 Melbourne, Australia
co-located with ICSE 2023

The International Conference on Mining Software Repositories (MSR) has hosted a mining challenge since 2006. With this challenge, we call upon everyone interested to apply their tools to a common dataset. The challenge is for researchers and practitioners to bravely use their mining tools and approaches on a dare.

Call for Mining Challenge Proposals

One of the secret ingredients behind the success of the International Conference on Mining Software Repositories (MSR) is its annual Mining Challenge, in which MSR participants can showcase their techniques, tools, and creativity on a common data set. In true MSR fashion, this data set is a real data set contributed by researchers in the community, solicited through an open call. There are many benefits of sharing a data set for the MSR Mining Challenge. The selected challenge proposal explaining the data set will appear in the MSR 2023 proceedings, and the challenge papers using the data set will be required to cite the challenge proposal or an existing paper of the researchers about the selected data set. Furthermore, the authors of the data set will join the MSR 2023 organizing committee as Mining Challenge (co-)chair(s), who will manage the reviewing process (e.g., recruiting a Challenge PC, managing submissions and review assignments). Finally, it is not uncommon for challenge data sets to feature in MSR and other publications well after the edition of the conference in which they appear!

If you would like to submit your data set for consideration for the 2023 MSR Mining Challenge, please submit a short proposal (1-2 pages plus appendices, if needed) at, containing the following information:

  1. Title of data set.
  2. High-level overview:
    • Short description, including what types of artifacts the data set contains.
    • Summary statistics (how many artifacts of different types).
  3. Internal structure:
    • How are the data structured and organized?
    • (Link to) Schema, if applicable
  4. How to access:
    • How can the data set be obtained?
    • What are recommended ways to access it? Include examples of specific tools, shell commands, etc, if applicable.
    • What skills, infrastructure, and/or credentials would challenge participants need to effectively work with the data set?
  5. What kinds of research questions do you expect challenge participants could answer?
  6. A link to a (sub)sample of the data for the organizing committee to peruse (e.g., via GitHub, Zenodo, Figshare).

Each submission must conform to the IEEE formatting instructions IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTeX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf options). For more information see here:

The first task of the authors of the selected proposal will be to prepare the Call for Challenge Papers, which outlines the expected content and structure of submissions, as well as the technical details of how to access and analyze the data set. This call will be published on the MSR website on August 15th. By making the challenge data set available by late summer, we hope that many students will be able to use the challenge data set for their graduate class projects in the Fall semester.

Important Dates

  • Deadline for proposals: July 18th, 2022
  • Notification: July 28th, 2022
  • Call for Challenge Papers Published: August 15th, 2022