CAIN 2025
Sun 27 - Mon 28 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025
Mon 28 Apr 2025 16:30 - 16:45 at 208 - Generative Model Engineering Chair(s): Manel Abdellatif

With the rise of generative Artificial Intelligence (AI), Machine Learning (ML) developers are becoming less reliant on real data to train their models. Data insufficiency can be resolved by using synthetic data generated by a diffusion model. However, beyond ad hoc interpretation of a generative model’s outputs, there is little assurance of the synthetic data’s adherence to the data requirement specifications. Adherence of synthetic data to these specifications is critical given that they describe desired downstream model behavior. Therefore, without proper verification methods for this synthetic data, ML developers cannot be confident in the behavior of the downstream model. This paper presents a verification method for generating synthetic data to train downstream ML models by prompting the generative model using requirement specifications, and tracing elements of the output back to the prompt. The purpose of this research is to embed requirements engineering into the data augmentation process to increase the rigor and acceptance of these generative AI models to train downstream ML models. This improves the transparency of the data augmentation process, potentially increasing the trust of stakeholders in the generated data, and the use of generative models for data augmentation in a wider range of applications. This also provides a more traditional approach to synthetic data generation to guide ML developers in augmenting their datasets, thus incorporating a more rigorous engineering process into the ML development, i.e., ML Engineering.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
Generative Model EngineeringResearch and Experience Papers / Industry Talks at 208
Chair(s): Manel Abdellatif École de Technologie Supérieure
16:00
15m
Talk
DDPT: Diffusion Driven Prompt Tuning for Large Language Model Code Generation
Research and Experience Papers
Jinyang Li The University of Adelaide, Sangwon Hyun CREST, University of Adelaide, Muhammad Ali Babar School of Computer Science, The University of Adelaide
16:15
15m
Talk
Engineering LLM Powered Multi-agent Framework for Autonomous CloudOpsDistinguished paper Award Candidate
Research and Experience Papers
Kannan Parthasarathy MontyCloud, Karthik Vaidhyanathan IIIT Hyderabad, Rudra Dhar SERC, IIIT Hyderabad, India, Venkat Krishnamachari MontyCloud, Adyansh Kakran International Institute of Information Technology, Hyderabad, Sreemaee Akshathala IIIT Hyderabad, Shrikara Arun IIIT Hyderabad, Amey Karan IIIT Hyderabad, Basil Muhammed MontyCloud, Sumant Dubey MontyCloud, Mohan Veerubhotla MontyCloud
16:30
15m
Talk
Generating and Verifying Synthetic Datasets with Requirements Engineering
Research and Experience Papers
Lynn Vonderhaar Embry-Riddle Aeronautical University, Timothy Elvira Embry-Riddle Aeronautical University, Omar Ochoa Embry-Riddle Aeronautical University
Pre-print
16:45
15m
Talk
LLM-Based Safety Case Generation for Baidu Apollo: Are We There Yet?
Research and Experience Papers
Oluwafemi Odu York University, Alvine Boaye Belle York University, Song Wang York University
17:00
12m
Talk
SqPal - text to SQL GenAI tool for PayPal
Industry Talks
Dan Liyanage PayPal, Mahshid Moha PayPal, Sandy Suresh PayPal
17:12
18m
Other
Discussion
Research and Experience Papers

:
:
:
: