Improving Deep Learning Framework Testing with Model-Level Metamorphic Testing (ISSTA 2025 - Research Papers)

Who

Yanzhou Mu, Juan Zhai, Chunrong Fang, Xiang Chen, Zhixiang Cao, Peiran Yang, Kexin Zhao, An Guo, Zhenyu Chen

Track

ISSTA 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 27 Jun 2025 14:50 - 15:15 at Cosmos 3A - AI Testing Chair(s): Cuiyun Gao

Abstract

Deep learning (DL) frameworks are essential to DL-based software systems, and bugs in frameworks may lead to substantial disasters, thus requiring effective testing. Researchers adopt DL models or single interfaces as test inputs and analyze their execution results to detect bugs. However, floating-point errors, inherent randomness, and the complexity of test inputs make it challenging to analyze execution results effectively. That is to say, existing methods suffer from the lack of suitable test oracles. Some researchers utilize metamorphic testing to tackle this challenge. They design Metamorphic Relations (MRs) based on input data and parameter settings of a single framework interface to generate equivalent test inputs, ensuring consistent execution results between original and generated inputs. Despite their promising effectiveness, they still face certain limitations. (1) MRs overlook structural complexity, limiting test input diversity; (2) MRs focus on single interfaces, which limits generalization and necessitates additional adaptations; (3) Bugs detected by them are related to the single interfaces and far from those exposed in multi-interface combinations and execution states (e.g., resource usage), which are common in real applications. To address these limitations, we propose ModelMeta, a model-level metamorphic testing method for DL frameworks with four MRs focused on model structure and calculation logic. ModelMeta inserts external structures to generate new models with consistent outputs, increasing interface diversity and detecting bugs without additional MRs. Besides, ModelMeta uses the QR-DQN strategy to guide model generation and then detects bugs from more fine grained perspectives of training loss, memory usage, and execution time.We evaluate the effectiveness of ModelMeta on three popular DL frameworks (i.e., MindSpore, PyTorch, and ONNX) with 17 DL models from 10 real-world tasks ranging from image classification to object detection. Results demonstrate that ModelMeta outperforms state-of-the-art baselines by identifying 27 new combinations of multiple interfaces that existing methods fail to detect. Regarding bug detection, ModelMeta has identified 31 new bugs, of which 27 have been confirmed, and 11 have been fixed. Among the 31 bugs, there are seven bugs that existing methods cannot detect, i.e., five wrong resource usage bugs and two low-efficiency bugs. These results demonstrate the practicality of our method.

DOI

https://doi.org/10.1145/3728972

Yanzhou Mu

Juan Zhai

University of Massachusetts at Amherst

United States

Chunrong Fang

Nanjing University

China

Xiang Chen

Nantong University

China

Zhixiang Cao

Xi'an Jiaotong University

China

Peiran Yang

Nanjing University

Kexin Zhao

Nanjing University

An Guo

Nanjing University

China

Zhenyu Chen

Nanjing University

China

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 27 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:30	AI TestingResearch Papers / Tool Demonstrations at Cosmos 3A Chair(s): Cuiyun Gao Harbin Institute of Technology

14:00 25m Talk		AudioTest: Prioritizing Audio Test Cases Research Papers Yinghua Li University of Luxembourg, Xueqi Dang University of Luxembourg, SnT, Wendkuuni Arzouma Marc Christian OUEDRAOGO University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg DOI Media Attached
14:25 25m Talk		S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models Research Papers Xiaohan Yuan Zhejiang University, Jinfeng Li Alibaba Group, Dongxia Wang Zhejiang University, Yuefeng Chen Alibaba Group, Xiaofeng Mao Alibaba Group, Longtao Huang Alibaba Group, Jialuo Chen Zhejiang University, Hui Xue Alibaba Group, Xiaoxia Liu Zhejiang University, Wenhai Wang Zhejiang University, Kui Ren Zhejiang University, Jingyi Wang Zhejiang University DOI
14:50 25m Talk		Improving Deep Learning Framework Testing with Model-Level Metamorphic Testing Research Papers Yanzhou Mu , Juan Zhai University of Massachusetts at Amherst, Chunrong Fang Nanjing University, Xiang Chen Nantong University, Zhixiang Cao Xi'an Jiaotong University, Peiran Yang Nanjing University, Kexin Zhao Nanjing University, An Guo Nanjing University, Zhenyu Chen Nanjing University DOI
15:15 15m Demonstration		ASTRAL: A Tool for the Automated Safety Testing of Large Language Models Tool Demonstrations Miriam Ugarte Mondragon University, Pablo Valle Mondragon University, José Antonio Parejo Maestre Universidad de Sevilla, Sergio Segura SCORE Lab, I3US Institute, Universidad de Sevilla, Seville, Spain, Aitor Arrieta Mondragon University

Information for Participants

Fri 27 Jun 2025 14:00 - 15:30 at Cosmos 3A - AI Testing Chair(s): Cuiyun Gao

Info for room Cosmos 3A:

Cosmos 3A is the first room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.