FSE 2025
Mon 23 - Fri 27 June 2025 Trondheim, Norway
co-located with ISSTA 2025
Wed 25 Jun 2025 14:40 - 15:00 at Cosmos 3C - Empirical Studies 2 Chair(s): Yuchao Jiang

Literate programming environments like Jupyter and R Markdown notebooks, coupled with easy-to-use languages like Python and R, put a plethora of statistical methods right at a data analyst’s fingertips. But are these methods being used correctly? Statistical methods make statistical assumptions about samples being analyzed, and in many cases produce reasonable looking results even if assumptions are not met.

We propose an approach that allows library developers to annotate functions with statistical assumptions, phrases them as hypotheses about the data, and inserts hypothesis tests investigating the likelihood that the assumption is met. As a proof of concept, we implement this approach in two tools: prob-check-py for Python, and prob-check-r for R. To evaluate these, we identify common hypothesis testing and statistical modeling functions in Python and R, annotate them with the relevant statistical assumptions, and run 128 Kaggle notebooks that use those methods to identify misuses. Our investigation reveals that at least one statistical assumption was violated in 84.38% of surveyed notebooks, and that assumptions were violated in 53.36% of calls to annotated functions. Moreover, had the appropriate hypothesis testing method been chosen given the characteristics of the data, a different conclusion would have been drawn in 11.51% of cases.

Wed 25 Jun

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:30
14:00
20m
Talk
An Empirical Analysis of Issue Templates Usage in Large-Scale Projects on GitHub
Journal First
Emre Sülün Bilkent University, Metehan Saçakcı Bilkent University, Eray Tüzün Bilkent University
14:20
20m
Talk
The Landscape of Toxicity: An Empirical Investigation of Toxicity on GitHub
Research Papers
Jaydeb Sarker University of Nebraska at Omaha, Asif Kamal Turzo Wayne State University, Amiangshu Bosu Wayne State University
DOI Pre-print
14:40
20m
Talk
Expressing and Checking Statistical Assumptions
Research Papers
Alexi Turcotte CISPA, Zheyuan Wu Saarland University
DOI
15:00
20m
Talk
Why the Proof Fails in Different Versions of Theorem Provers: An Empirical Study of Compatibility Issues in Isabelle
Research Papers
Xiaokun Luan Peking University, David Sanan Singapore Institute of Technology, Zhe Hou Griffith University, Qiyuan Xu Nanyang Technological University, Chengwei Liu Nanyang Technological University, Yufan Cai National University of Singapore, Yang Liu Nanyang Technological University, Meng Sun Peking University
DOI
15:20
10m
Talk
Missing Threats: Dealing with the Treatment-sensitive Factorial Structure Bias in Empirical Software Engineering
Ideas, Visions and Reflections
Sabato Nocera University of Salerno, Giuseppe Scanniello University of Salerno

Information for Participants
Wed 25 Jun 2025 14:00 - 15:30 at Cosmos 3C - Empirical Studies 2 Chair(s): Yuchao Jiang
Info for room Cosmos 3C:

Cosmos 3C is the third room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.

:
:
:
: