ICSE 2024
Fri 12 - Sun 21 April 2024 Lisbon, Portugal

What is a flaky test?

Software developers rely on test cases to identify bugs in their code and to provide a signal as to their code’s correctness. Should such signals have a history of unreliability, they not only become less informative, but may also be considered untrustworthy. In the context of software testing, practitioners refer to these unreliable signals as flaky tests. The definition varies slightly, but a flaky test is generally defined as a test case that can pass and fail without changes to the test case code or the code under test.

Why are they such a big deal?

Concurrency and randomness are well-established causes among many others, though flakiness has far-reaching negative consequences regardless of origin. These consequences are felt by developers from small open-source projects to the likes of Google, Microsoft, and Meta. Flaky tests challenge the assumption that a test failure implies a bug, constituting a leading cause of “false alarm” test failures, and potentially more seriously, having the potential to mask the presence of a genuine bug. Flaky tests may lead to time wasted debugging spurious failures, leading developers to ignore future test failures. This is detrimental to software stability, because while a flaky test may be unreliable, it could still indicate a genuine bug in some instances. This is further exacerbated when flaky tests accumulate, as developers may lose trust in the entire test suite.

What are we going to do about it?

Flaky tests as a research topic has grown in interest significantly within the software engineering community in recent years. This has produced a wide array of empirical studies on the causes of flaky tests and experimental tools for their detection and repair. Despite this, no dedicated workshop on the issue has ever been organized. We are therefore delighted to announce the first International Flaky Test Workshop (FTW). The workshop welcomes submissions on topics relating to flaky tests and will provide an opportunity for academic researchers and industrial practitioners to exchange ideas about test flakiness. Please see the Call for Papers for more information.

Plenary
You're viewing the program in a time zone which is different from your device's time zone change time zone

Sun 14 Apr

Displayed time zone: Lisbon change

09:00 - 10:30
KeynoteFTW at Amália Rodrigues
Chair(s): Martin Gruber BMW Group, University of Passau
09:00
90m
Keynote
Keynote
FTW
K: Darko Marinov University of Illinois at Urbana-Champaign
10:30 - 11:00
Coffee BreakCatering at Open Space
10:30
30m
Coffee break
Break
Catering

11:00 - 12:30
Mitigating Flaky Failures in CIFTW at Amália Rodrigues
Chair(s): Tim A. D. Henderson Google
11:00
30m
Paper
Presubmit Rescue: Automatically Ignoring FlakyTest Executions
FTW
A: Minh Hoang Google, A: Adrian Berding
11:30
30m
Paper
Regression Test History Data for Flaky Test Research
FTW
A: Philipp Wendler , A: Stefan Winter Ulm University and LMU Munich
File Attached
12:00
30m
Paper
Predicting the Lifetime of Flaky Tests on Chrome
FTW
A: Samaneh Malmir Concordia University, A: Peter Rigby Concordia University; Meta
12:30 - 14:00
12:30
90m
Lunch
Lunch
Catering

14:00 - 15:30
Debugging Flaky Tests in Different DomainsFTW at Amália Rodrigues
Chair(s): Owain Parry The University of Sheffield
14:00
30m
Paper
On the Impact of Hitting System Resource Limits on Test Flakiness
FTW
A: Fabian Leinen Technical University of Munich, A: Alexander Perathoner Technical University of Munich, A: Alexander Pretschner TU Munich
Pre-print Media Attached
14:30
30m
Paper
Flaky Tests in the AI Domain
FTW
A: Péter Attila Soha Department of Software Engineering, University of Szeged, A: Béla Vancsics , A: Tamás Gergely Department of Software Engineering, University of Szeged, A: Árpád Beszédes Department of Software Engineering, University of Szeged
15:00
30m
Paper
Can ChatGPT Repair Non-Order-Dependent Tests?
FTW
A: Yang Chen University of Illinois at Urbana-Champaign, A: Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign
15:30 - 16:00
Coffee BreakCatering at Open Space
15:30
30m
Coffee break
Break
Catering

16:00 - 17:30
Discussion PanelFTW at Amália Rodrigues
Chair(s): Phil McMinn University of Sheffield
16:00
90m
Panel
Discussion Panel
FTW
P: Jonathan Bell Northeastern University, P: Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland, P: Mark Harman Meta Platforms, Inc. and UCL, P: Darko Marinov University of Illinois at Urbana-Champaign, P: Sigrid Eldh Ericsson AB, Mälardalen University, Carleton Unviersity

Call for Papers

FTW welcomes submissions on topics relating to flaky tests. The workshop will provide an opportunity for academic researchers and industrial practitioners to exchange ideas about test flakiness and to find out about current research directions and industrial challenges. A major goal of FTW is to foster collaboration and exchange between academia and industry. The workshop is inclusive of quantitative, qualitative, and mixed-methods research. Topics of interest include (but are not limited to):

  • Costs and consequences of flaky tests.
  • Causes of flaky tests.
  • Detection of flaky tests.
  • Mitigation of flaky tests.
  • Repair of flaky tests.

We expect a significant portion of the day to be spent on presentations and discussions of extended abstracts, but there will also be more formal short paper presentations. Please note that due to ICSE restrictions, submissions cannot exceed 8 pages. Submissions can take one of two formats:

  • Extended abstract (max. 2 pages including references): New ideas, problems and challenges, view points, work in progress.
  • Short paper (max. 8 pages including references): Technical research, experience reports, empirical studies.

Submission

All submissions must be submitted via the following link: https://easychair.org/conferences/?conf=ftw24.

Each submission will be reviewed by the program committee with respect to suitability for the workshop, following a double-blind process for short papers and a single-blind process for extended abstracts. This means that the identity of short paper authors must not be revealed in their submissions. All authors should use the official “ACM Primary Article Template”, as can be obtained from the ACM Proceedings Template page. LaTeX users should use the sigconf option as well as review to produce line numbers. Authors of short papers must also use anonymous to omit author names. For example, a short paper author should include the following line at the beginning of the document:

\documentclass[sigconf,review,anonymous]{acmart}

Important Dates

  • Paper submission: December 7th 2023 AoE.
  • Acceptance notification: January 11th 2024 AoE.
  • Camera ready: January 25th 2024 AoE.

As part of the workshop, we’ll be hosting a discussion panel on test flakiness. The panel consists of researchers and practitioners with a proven track record in the field. The current confirmed panelists are:

Name
Picture
Biography
Jonathan Bell Jonathan Bell

Jon is an Assistant Professor directing research in Software Engineering and Software Systems at Northeastern University. His research makes it easier for developers to create reliable and secure software by improving software testing and program analysis. Jon’s work on accelerating software testing has been recognized with an ACM SIGSOFT Distinguished Paper Award (ICSE ’14 – Unit Test Virtualization with VMVM), and was the basis for an industrial collaboration with Electric Cloud. His research in flaky tests have led to open source contributions to the Maven build system and Pit mutation testing framework. His program analysis research has resulted in several widely adopted runtime systems for the JVM, including the Phosphor taint tracking system (OOPSLA ’14) and CROCHET checkpoint/rollback tool (ECOOP ’18). His contributions to the object-oriented programming community were recognized with the 2020 Dahl-Nygaard Junior Researcher Prize, and he was invited to give a keynote address at SPLASH on this work. His research has been funded by the NSA and the NSF, and he is the recipient of the NSF CAREER award.

Lionel C. Briand Lionel C. Briand

Lionel C. Briand is professor of software engineering and has shared appointments between (1) The University of Ottawa, Canada and (2) The SnT centre for Security, Reliability, and Trust, University of Luxembourg. In collaboration with colleagues, over 25 years, he has run many collaborative research projects with companies in the automotive, satellite, aerospace, energy, financial, and legal domains. Lionel has held various engineering, academic, and leading positions in six countries. He was one of the founders of the ICST conference (IEEE Int. Conf. on Software Testing, Verification, and Validation, a CORE A event) and its first general chair. He was also EiC of Empirical Software Engineering (Springer) for 13 years and led, in collaboration with first Victor Basili and then Tom Zimmermann, the journal to the top tier of the very best publication venues in software engineering.

Mark Harman Lionel C. Briand

Mark Harman is a full-time Research Scientist at FACEBOOK London, working on FACEBOOK’s Web Enabled Simulation system WW, together with a London-based FACEBOOK team focussing in AI for scalable software engineering. WW is Facebook’s Cyber-Cyber Digital Twin of its platforms, being built with the long-term aim of measuring, predicting and optimising behaviour across all FACEBOOK’s platforms. Mark also holds a part-time professorship at UCL and was previously the manager of FACEBOOK’s Sapienz team team, which grew out of Majicke, a start up co-founded by Mark and acquired by FACEBOOK in 2017. The Sapienz tech has been fully deployed as part of FACEBOOK’s overall CI system since 2017 and the FACEBOOK Sapienz continues to develop and extend it. Sapienz has found and helped to fix thousands of bugs before they hit production, on systems of tens of millions of lines of code, used by over 2.6 billion people world wide every day. In his more purely scientific work, Mark co-founded the field Search Based Software Engineering (SBSE), and is also known for scientific research on source code analysis, software testing, app store analysis and empirical software engineering. He received the IEEE Harlan Mills Award and the ACM Outstanding Research Award in 2019 for his work and was awarded a fellowship of the Royal Academy of Engineering in 2020.

Darko Marinov Darko Marinov

Darko Marinov is a Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. His main research interests are in Software Engineering, in particular improving software quality using software testing. He has a lot of fun looking for software bugs. He published over 100 conference papers, winning three “test-of-time” awards – two ACM SIGSOFT Impact Paper awards (2012 and 2019) and one ASE Most Influential Paper Award (2015) – and eight more paper awards – seven ACM SIGSOFT Distinguished Paper awards (2002, 2005, 2010, 2015, 2016, 2017, 2021) and one CHI Best Paper Award (2017). His work has been supported by AFRL via BBN, Boeing, Facebook, Google, Huawei, IBM, Intel, Microsoft, NSF, Qualcomm, Samsung, and SRC.

Sigrid Eldh Sigrid Eldh

Dr. Sigrid Eldh currently works full time leading research on Quality and Software Test at Ericsson AB, in Stockholm, where she worked since 1994. She aids in research collaboration and supervision of PhD students as a senior lecturer at MDH and as an adjunct Professor at Carleton University, Ottawa in Canada. She earned her MSc in Computer Science from Uppsala University, and PhD from Mälardalens Högskola titled “On Test Design”. She was the initiator of ISTQB and also started and chaired the Swedish charter, SSTB (Swedish Software Testing Board) the first 7 years. She also started SAST - Swedish Association for Software Test that she chaired the first years. She currently serves as IEEE Software Editor-In-Chief.

We are pleased to announce that Darko Marinov will be our keynote speaker. The work of Darko, alongside his students and collaborators, forms an integral strand of the research literature on flaky tests. He co-authored An Empirical Analysis of Flaky Tests, one of the earliest and most well-cited studies in the field. This work introduced a range of categories for the causes of flaky tests that have been reused and adapted in many subsequent papers. Darko has also been involved in the development and scientific evaluation of several automated tools for dealing with flaky tests, including iDFlakies for detecting flaky tests and iFixFlakies for repairing order-dependent flaky tests. You can browse the full list of his publications on his personal website.

Darko Marinov

Questions? Use the FTW contact form.