Generating and Verifying Synthetic Datasets with Requirements Engineering
This program is tentative and subject to change.
With the rise of generative Artificial Intelligence (AI), Machine Learning (ML) developers are becoming less reliant on real data to train their models. Data insufficiency can be resolved by using synthetic data generated by a diffusion model. However, beyond ad hoc interpretation of a generative model’s outputs, there is little assurance of the synthetic data’s adherence to the data requirement specifications. Adherence of synthetic data to these specifications is critical given that they describe desired downstream model behavior. Therefore, without proper verification methods for this synthetic data, ML developers cannot be confident in the behavior of the downstream model. This paper presents a verification method for generating synthetic data to train downstream ML models by prompting the generative model using requirement specifications, and tracing elements of the output back to the prompt. The purpose of this research is to embed requirements engineering into the data augmentation process to increase the rigor and acceptance of these generative AI models to train downstream ML models. This improves the transparency of the data augmentation process, potentially increasing the trust of stakeholders in the generated data, and the use of generative models for data augmentation in a wider range of applications. This also provides a more traditional approach to synthetic data generation to guide ML developers in augmenting their datasets, thus incorporating a more rigorous engineering process into the ML development, i.e., ML Engineering.
This program is tentative and subject to change.
Sun 27 AprDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:30 | |||
14:00 15mTalk | Themes of Building LLM-based Applications for Production: A Practitioner's View Research and Experience Papers Alina Mailach Leipzig University, Sebastian Simon Leipzig University, Johannes Dorn Leipzig University, Norbert Siegmund Leipzig University | ||
14:15 15mTalk | LLM-Based Safety Case Generation for Baidu Apollo: Are We There Yet? Research and Experience Papers | ||
14:30 15mTalk | An AI-driven Requirements Engineering Framework Tailored for Evaluating AI-Based Software Research and Experience Papers Hamed Barzamini , Fatemeh Nazaritiji Northern Illinois University, Annalise Brockmann Northern Illinois University, Hasan Ferdowsi Northern Illinois university, Mona Rahimi Northern Illinois University | ||
14:46 14mTalk | Engineering LLM Powered Multi-agent Framework for Autonomous CloudOps Research and Experience Papers Kannan Parthasarathy MontyCloud, Karthik Vaidhyanathan IIIT Hyderabad, Rudra Dhar SERC, IIIT Hyderabad, India, Venkat Krishnamachari MontyCloud, Adyansh Kakran International Institute of Information Technology, Hyderabad, Sreemaee Akshathala IIIT Hyderabad, Shrikara Arun IIIT Hyderabad, Amey Karan IIIT Hyderabad, Basil Muhammed MontyCloud, Sumant Dubey MontyCloud, Mohan Veerubhotla MontyCloud | ||
15:00 15mTalk | Generating and Verifying Synthetic Datasets with Requirements Engineering Research and Experience Papers Lynn Vonderhaar Embry-Riddle Aeronautical University, Timothy Elvira Embry-Riddle Aeronautical University, Omar Ochoa Embry-Riddle Aeronautical University | ||
15:15 15mTalk | InsightAI: Root Cause Analysis in Large Hierarchical Log Files with Private Data Using Large Language Models Research and Experience Papers Maryam Ekhlasi Polytechnique Montreal, Anurag Prakash Ciena, Michel Dagenais Polytechnique Montréal, Maxime Lamothe Polytechnique Montreal |