ASE 2024
Sun 27 October - Fri 1 November 2024 Sacramento, California, United States
Tue 29 Oct 2024 11:15 - 11:30 at Magnoila - SE for AI 1 Chair(s): Chengcheng Wan

Context and Problem Statement: Many engineering organizations are reimplementing and extending deep neural networks from the research community. We describe this process as deep learning model reengi- neering. Deep learning model reengineering — reusing, replicating, adapting, and enhancing state-of-the-art deep learning approaches — is challenging for reasons including under-documented reference models, changing requirements, and the cost of implementation and testing. Related Works: Prior work has characterized the challenges of deep learning model development, but as yet we know little about the deep learning model reengineering process and its common challenges. Prior work has examined DL systems from a “product” view, examining defects from projects regardless of the engineers’ purpose. Our study is focused on reengineering activities from a “process” view, and focuses on engineers specifically engaged in the reengineering process.

Methodology: Our goal is to understand the characteristics and challenges of deep learning model reengi- neering. We conducted a mixed-methods case study of this phenomenon, focusing on the context of computer vision. Our results draw from two data sources: defects reported in open-source reeengineering projects, and interviews conducted with practitioners and the leaders of a reengineering team. From the defect data source, we analyzed 348 defects from 27 open-source deep learning projects. Meanwhile, our reengineering team repli- cated 7 deep learning models over two years; we interviewed 2 open-source contributors, 4 practitioners, and 6 reengineering team leaders to understand their experiences.

Results: Our results describe how deep learning-based computer vision techniques are reengineered, quan- titatively analyze the distribution of defects in this process, and qualitatively discuss challenges and practices. We found that most defects (58%) are reported by re-users, and that reproducibility-related defects tend to be discovered during training (68% of them are). Our analysis shows that most environment defects (88%) are interface defects, and most environment defects (46%) are caused by API defects. We found that training defects have diverse symptoms and root causes. We identified four main challenges in the DL reengineering process: model operationalization, performance debugging, portability of DL operations, and customized data pipeline. Integrating our quantitative and qualitative data, we propose a novel reengineering workflow. We compared our work with prior studies on DL development, traditional software reengineering process, and pre-trained model reuse. Our work shares similar findings with prior defect studies from DL product perspectives, such as a large proportion of API defects. However, we found a higher proportion of defects caused by hyper-parameter tuning and training data quality. Moreover, we discovered and described similarities between model reengineering and pre-trained DL model reuse. There is some overlap between these two topics and the challenges and practices can also be shareable in both domains.

Future directions: Our findings inform several future directions, including: standardizing model reengi- neering practices, developing validation tools to support model reengineering, automated support beyond manual model reengineering, and measuring additional unknown aspects of model reengineering.

Tue 29 Oct

Displayed time zone: Pacific Time (US & Canada) change

10:30 - 12:00
SE for AI 1NIER Track / Journal-first Papers / Research Papers at Magnoila
Chair(s): Chengcheng Wan East China Normal University
10:30
15m
Talk
Evaluating Terminology Translation in Machine Translation Systems via Metamorphic Testing
Research Papers
Yihui Xu Soochow University, Yanhui Li Nanjing University, Jun Wang Nanjing University, Xiaofang Zhang Soochow University
DOI
10:45
15m
Talk
Mutual Learning-Based Framework for Enhancing Robustness of Code Models via Adversarial Training
Research Papers
Yangsen Wang Peking University, Yizhou Chen Peking University, Yifan Zhao Peking University, Zhihao Gong Peking University, Junjie Chen Tianjin University, Dan Hao Peking University
DOI Pre-print
11:00
15m
Talk
Supporting Safety Analysis of Image-processing DNNs through Clustering-based Approaches
Journal-first Papers
Mohammed Attaoui University of Luxembourg, Fabrizio Pastore University of Luxembourg, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland
11:15
15m
Talk
Challenges and Practices of Deep Learning Model Reengineering: A Case Study on Computer Vision
Journal-first Papers
Wenxin Jiang Purdue University, Vishnu Banna Purdue University, Naveen Vivek Purdue University, Abhinav Goel Purdue University, Nicholas Synovic Loyola University Chicago, George K. Thiruvathukal Loyola University Chicago, James C. Davis Purdue University
Link to publication DOI Media Attached File Attached
11:30
10m
Talk
A Conceptual Framework for Quality Assurance of LLM-based Socio-critical Systems
NIER Track
Luciano Baresi Politecnico di Milano, Matteo Camilli Politecnico di Milano, Tommaso Dolci Politecnico di Milano, Giovanni Quattrocchi Politecnico di Milano
11:40
10m
Talk
Towards Robust ML-enabled Software Systems: Detecting Out-of-Distribution data using Gini Coefficients
NIER Track
Hala Abdelkader Applied Artificial Intelligence Institute, Deakin University, Jean-Guy Schneider Monash University, Mohamed Abdelrazek Deakin University, Australia, Priya Rani RMIT University, Rajesh Vasa Deakin University, Australia
11:50
10m
Talk
Attacks and Defenses for Large Language Models on Coding Tasks
NIER Track
Chi Zhang , Zifan Wang Center for AI Safety, Ruoshi Zhao Independent Researcher, Ravi Mangal Colorado State University, Matt Fredrikson Carnegie Mellon University, Limin Jia , Corina S. Păsăreanu Carnegie Mellon University; NASA Ames