Write a Blog >>
MSR 2022
Mon 23 - Tue 24 May 2022
co-located with ICSE 2022

Background. From information theory, surprisal is a measurement of how unexpected a particular event is. Statistical language models provide a probabilistic approximation of natural languages, and because surprisal is constructed with the probability of an event occuring, it is therefore possible to determine the surprisal associated with English sentences. The issues and pull requests of software repository issue trackers provide insight into the development process and likely contain the surprising events of this process.

Objective. Prior works have identified that unusual events in software repositories are of interest to developers, and use simple code metrics-based methods for detecting them. In this study we will propose a new method for unusual event detection in software repositories using surprisal. With the ability to find surprising issues and pull requests, we intend to further analyse them to determine if they actually hold importance in a repository, or if they pose a significant challenge to address. If it is possible to find bad surprises early, or before they cause additional troubles, it is plausible that effort, cost and time will be saved as a result.

Method. After extracting the issues and pull requests from 5000 of the most popular software repositories on GitHub, we will train a language model to represent these issues. We will then measure their perceived importance in the repository, measure their resolution difficulty using several analogues, measure the surprisal of each, and finally generate inferential statistics to describe any correlations.

Wed 18 May

Displayed time zone: Eastern Time (US & Canada) change

03:00 - 03:50
03:00
4m
Talk
An Alternative Issue Tracking Dataset of Public Jira Repositories
Data and Tool Showcase Track
Lloyd Montgomery Universität Hamburg, Clara Marie Lüders University of Hamburg, Walid Maalej University of Hamburg
Pre-print Media Attached
03:04
7m
Talk
Smelly Variables in Ansible Infrastructure Code: Detection, Prevalence, and Lifetime
Technical Papers
Ruben Opdebeeck Vrije Universiteit Brussel, Ahmed Zerouali Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel
Pre-print
03:11
7m
Talk
Beyond Duplicates: Towards Understanding and Predicting Link Types in Issue Tracking Systems
Technical Papers
Clara Marie Lüders University of Hamburg, Abir Bouraffa University of Hamburg, Walid Maalej University of Hamburg
DOI Pre-print
03:18
7m
Talk
Real-World Clone-Detection in Go
Industry Track
Qinyun Wu Bytedance Ltd., Huan Song Bytedance Ltd., Ping Yang Bytedance Network Technology
03:25
4m
Talk
Towards Using Gameplay Videos for Detecting Issues in Video Games
Registered Reports
Emanuela Guglielmi University of Molise, Simone Scalabrino University of Molise, Gabriele Bavota Software Institute, USI Università della Svizzera italiana, Rocco Oliveto University of Molise
Pre-print
03:29
4m
Talk
Is Surprisal in Issue Trackers Actionable?
Registered Reports
James Caddy University of Adelaide, Markus Wagner University of Adelaide, Australia, Christoph Treude University of Melbourne, Earl T. Barr University College London, UK, Miltiadis Allamanis Microsoft Research
DOI Pre-print Media Attached
03:33
17m
Live Q&A
Discussions and Q&A
Technical Papers


Information for Participants
Wed 18 May 2022 03:00 - 03:50 at MSR Main room - odd hours - Session 2: Maintenance (Issues & Smells) Chair(s): Alessio Ferrari
Info for room MSR Main room - odd hours:

Click here to go to the room on Midspace