Thu 18 May 2023 14:30 - 14:37 at Meeting Room 110 - Test quality and improvement Chair(s): Guowei Yang
Flaky tests obstruct software development, and studying and proposing mitigations against them has therefore become an important focus of software engineering research. To conduct sound investigations on test flakiness, it is crucial to have large, diverse, and unbiased datasets of flaky tests. A common method to build such datasets is by rerunning the test suites of selected projects multiple times and checking for tests that produce different outcomes. While using this technique on a single project is mostly straightforward, applying it to a large and diverse set of projects raises several implementation challenges such as (1) isolating the test executions, (2) supporting multiple build mechanisms, (3) achieving feasible run times on large data sets, and (4) analyzing and presenting the test outcomes. To address these challenges we introduce FlaPy, a framework for researchers to mine flaky tests in a given or automatically sampled set of Python projects by rerunning their test suites. FlaPy isolates the test executions using containerization and fresh execution environments to simulate real-world CI conditions and to achieve accurate results. By supporting multiple dependency installation strategies, it promotes diversity among the studied projects. FlaPy supports parallelizing the test executions using SLURM, making it feasible to scan thousands of projects for test flakiness. Finally, FlaPy analyzes the test outcomes to determine which tests are flaky and depicts the results in a concise table. A demo video of FlaPy is available at https://youtu.be/ejy-be-FvDY
Wed 17 MayDisplayed time zone: Hobart change
10:30 - 11:00 | |||
10:30 30mTalk | FlaPy: Mining Flaky Python Tests at Scale DEMO - Demonstrations Pre-print |
Thu 18 MayDisplayed time zone: Hobart change
13:45 - 15:15 | Test quality and improvementTechnical Track / Journal-First Papers / DEMO - Demonstrations at Meeting Room 110 Chair(s): Guowei Yang University of Queensland | ||
13:45 15mTalk | Test Selection for Unified Regression Testing Technical Track Shuai Wang University of Illinois at Urbana-Champaign, Xinyu Lian University of Illinois at Urbana-Champaign, Darko Marinov University of Illinois at Urbana-Champaign, Tianyin Xu University of Illinois at Urbana-Champaign Pre-print | ||
14:00 15mTalk | ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolutionary Search Technical Track Rongqi Pan University of Ottawa, Taher A Ghaleb University of Ottawa, Lionel Briand University of Luxembourg; University of Ottawa | ||
14:15 15mTalk | Measuring and Mitigating Gaps in Structural Testing Technical Track Soneya Binta Hossain University of Virginia, Matthew B Dwyer University of Virginia, Sebastian Elbaum University of Virginia, Anh Nguyen-Tuong University of Virginia Pre-print | ||
14:30 7mTalk | FlaPy: Mining Flaky Python Tests at Scale DEMO - Demonstrations Pre-print | ||
14:37 7mTalk | Scalable and Accurate Test Case Prioritization in Continuous Integration Contexts Journal-First Papers Ahmadreza Saboor Yaraghi University of Ottawa, Mojtaba Bagherzadeh University of Ottawa, Nafiseh Kahani University of Carlton, Lionel Briand University of Luxembourg; University of Ottawa | ||
14:45 7mTalk | Flakify: A Black-Box, Language Model-based Predictor for Flaky Tests Journal-First Papers Sakina Fatima University of Ottawa, Taher A Ghaleb University of Ottawa, Lionel Briand University of Luxembourg; University of Ottawa | ||
14:52 7mTalk | Developer-centric test amplification Journal-First Papers Pre-print | ||
15:00 7mTalk | How Developers Engineer Test Cases: An Observational Study Journal-First Papers MaurĂcio Aniche Delft University of Technology, Christoph Treude University of Melbourne, Andy Zaidman Delft University of Technology Pre-print |