Is Surprisal in Issue Trackers Actionable? (MSR 2022 - Registered Reports)

Who

James Caddy, Markus Wagner , Christoph Treude, Earl T. Barr, Miltiadis Allamanis

Track

MSR 2022 Registered Reports

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 18 May 2022 03:29 - 03:33 at MSR Main room - odd hours - Session 2: Maintenance (Issues & Smells) Chair(s): Alessio Ferrari

Abstract

Background. From information theory, surprisal is a measurement of how unexpected a particular event is. Statistical language models provide a probabilistic approximation of natural languages, and because surprisal is constructed with the probability of an event occuring, it is therefore possible to determine the surprisal associated with English sentences. The issues and pull requests of software repository issue trackers provide insight into the development process and likely contain the surprising events of this process.

Objective. Prior works have identified that unusual events in software repositories are of interest to developers, and use simple code metrics-based methods for detecting them. In this study we will propose a new method for unusual event detection in software repositories using surprisal. With the ability to find surprising issues and pull requests, we intend to further analyse them to determine if they actually hold importance in a repository, or if they pose a significant challenge to address. If it is possible to find bad surprises early, or before they cause additional troubles, it is plausible that effort, cost and time will be saved as a result.

Method. After extracting the issues and pull requests from 5000 of the most popular software repositories on GitHub, we will train a language model to represent these issues. We will then measure their perceived importance in the repository, measure their resolution difficulty using several analogues, measure the surprisal of each, and finally generate inferential statistics to describe any correlations.

Link to Preprint

https://arxiv.org/abs/2204.07363

DOI

https://doi.org/10.48550/arXiv.2204.07363

James Caddy

University of Adelaide

Markus Wagner

University of Adelaide, Australia

Australia

Christoph Treude

University of Melbourne

Australia

Earl T. Barr

University College London, UK

Miltiadis Allamanis

Microsoft Research

Is Surprisal in Issue Trackers Actionable? (MSR '22)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 18 May
Displayed time zone: Eastern Time (US & Canada) change

03:00 - 03:50	Session 2: Maintenance (Issues & Smells) Technical Papers / Registered Reports / Data and Tool Showcase Track / Industry Track at MSR Main room - odd hours Chair(s): Alessio Ferrari CNR-ISTI

03:00 4m Talk		An Alternative Issue Tracking Dataset of Public Jira Repositories Data and Tool Showcase Track Lloyd Montgomery Universität Hamburg, Clara Marie Lüders University of Hamburg, Walid Maalej University of Hamburg Pre-print Media Attached
03:04 7m Talk		Smelly Variables in Ansible Infrastructure Code: Detection, Prevalence, and Lifetime Technical Papers Ruben Opdebeeck Vrije Universiteit Brussel, Ahmed Zerouali Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel Pre-print
03:11 7m Talk		Beyond Duplicates: Towards Understanding and Predicting Link Types in Issue Tracking Systems Technical Papers Clara Marie Lüders University of Hamburg, Abir Bouraffa University of Hamburg, Walid Maalej University of Hamburg DOI Pre-print
03:18 7m Talk		Real-World Clone-Detection in Go Industry Track Qinyun Wu Bytedance Ltd., Huan Song Bytedance Ltd., Ping Yang Bytedance Network Technology
03:25 4m Talk		Towards Using Gameplay Videos for Detecting Issues in Video Games Registered Reports Emanuela Guglielmi University of Molise, Simone Scalabrino University of Molise, Gabriele Bavota Software Institute, USI Università della Svizzera italiana, Rocco Oliveto University of Molise Pre-print
03:29 4m Talk		Is Surprisal in Issue Trackers Actionable? Registered Reports James Caddy University of Adelaide, Markus Wagner University of Adelaide, Australia, Christoph Treude University of Melbourne, Earl T. Barr University College London, UK, Miltiadis Allamanis Microsoft Research DOI Pre-print Media Attached
03:33 17m Live Q&A		Discussions and Q&A Technical Papers

Information for Participants

Wed 18 May 2022 03:00 - 03:50 at MSR Main room - odd hours - Session 2: Maintenance (Issues & Smells) Chair(s): Alessio Ferrari

Info for room MSR Main room - odd hours:

Click here to go to the room on Midspace