FSE 2025
Mon 23 - Fri 27 June 2025 Trondheim, Norway
Mon 23 Jun 2025 11:00 - 11:20 at Aurora B - Bug Detection Chair(s): Lingming Zhang

In recent years, deep learning has seen widespread adoption across various domains, giving rise to large-scale models such as large language models. Training these models, particularly in distributed environments, presents substantial computational and communication challenges. A critical issue is the communication deadlock—a state in which processes become indefinitely stalled while awaiting network messages from others, which leads to resource wastage and reduced productivity. Current approaches to deadlock handling are either unsuitable for deep learning due to its unique hybrid programming paradigm or limit optimization opportunities. This paper presents dl², a novel dynamic analysis tool designed to detect communication deadlocks in deep learning jobs. dl² models the runtime trace of a job as an execution graph, detects unmatched communications, and constructs a wait-for graph to identify deadlock cycles. dl² can also handle nondeterministic communication behaviors, providing replay and diagnostic support for root cause analysis. We evaluate dl² using PyTorch with a combination of synthetic test cases and real-world deep learning workloads. The experimental results show that dl² successfully detects all communication deadlocks, achieving 100% precision and recall, which highlights its effectiveness.

Mon 23 Jun

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 12:20
Bug DetectionResearch Papers / Industry Papers / Demonstrations / Journal First at Aurora B
Chair(s): Lingming Zhang University of Illinois at Urbana-Champaign
10:30
20m
Talk
Yuga: Automatically Detecting Lifetime Annotation Bugs in the Rust Language
Journal First
Vikram Nitin Columbia University, Anne Mulhern Red Hat Inc, Sanjay Arora Red Hat Inc, Baishakhi Ray Columbia University
10:50
10m
Talk
SpecChecker-Int: An Extensible Concurrency Bugs Detection Tool for Interrupt-driven Embedded Software
Demonstrations
Boxiang Wang Beijing Sunwise Information Technology Ltd, Chao Li Beijing Institute of Control Engineering; Beijing Sunwise Information Technology, Rui Chen Beijing Institute of Control Engineering; Beijing Sunwise Information Technology, Sheng Wang Beijing Sunwise Information Technology Ltd, Chunpeng Jia Beijing Sunwise Information Technology Ltd, Mengfei Yang China Academy of Space Technology
11:00
20m
Talk
dl²: Detecting Communication Deadlocks in Deep Learning Jobs
Industry Papers
Yanjie Gao Microsoft Research, Jiyu Luo University of Science and Technology of China, Haoxiang Lin Microsoft Research, Hongyu Zhang Chongqing University, Ming Wu Zero Gravity Labs, Mao Yang Microsoft Research
DOI Pre-print
11:20
20m
Talk
Detecting Metadata-Related Bugs in Enterprise Applications
Research Papers
Md Mahir Asef Kabir Virginia Tech, Xiaoyin Wang University of Texas at San Antonio, Na Meng Virginia Tech
DOI
11:40
20m
Talk
ROSCallBaX: Statically Detecting Inconsistencies In Callback Function Setup of Robotic Systems
Research Papers
Sayali Kate Purdue University, Yifei Gao Purdue University, Shiwei Feng Purdue University, Xiangyu Zhang Purdue University
DOI
12:00
20m
Talk
Enhancing Web Accessibility: Automated Detection of Issues with Generative AI
Research Papers
Ziyao He University of California, Irvine, Syed Fatiul Huq University of California, Irvine, Sam Malek University of California at Irvine
DOI

Information for Participants
Mon 23 Jun 2025 10:30 - 12:20 at Aurora B - Bug Detection Chair(s): Lingming Zhang
Info for room Aurora B:

Aurora B is the second room in the Aurora wing.

When facing the main Cosmos Hall, access to the Aurora wing is on the right, close to the side entrance of the hotel.