TCSE logo 
 Sigsoft logo
Sustainability badge

This program is tentative and subject to change.

Fri 2 May 2025 11:45 - 12:00 at 214 - AI for Testing and QA 5

Release engineering has traditionally focused on continuously delivering features and bug fixes to users, but at a certain scale, it becomes impossible for a release engineering team to determine what should be released. At Meta’s scale, the responsibility appropriately and necessarily falls back on the engineer writing and reviewing the code. To address this challenge, we developed models of diff risk scores (DRS) to determine how likely a diff is to cause a SEV, i.e., a severe fault that impacts end-users. Assuming that SEVs are only caused by diffs, a naive model could randomly gate X% of diffs from landing, which would automatically catch X% of SEVs on average. However, we aimed to build a model that can capture Y% of SEVs by gating X% of diffs, where Y >> X. By training the model on historical data on diffs that have caused SEVs in the past, we can predict the riskiness of an outgoing diff to cause a SEV. Diffs that are beyond a particular threshold of risk can then be gated. We have four types of gating: no gating (green), weekend gating (weekend), medium impact on end-users (yellow), and high impact on end-users (red). The input parameter for our models is the level of gating, and the outcome measure is the number of captured SEVs, i.e., the number of gated diffs that would have led to a SEV. Our research approaches include a logistic regression model, a BERT-based model, and generative LLMs. Our baseline regression model captures 18.7%, 27.9%, and 84.6% of SEVs while respectively gating the top 5% (weekend), 10% (yellow), and 50% (red) of risky diffs. The BERT-based model, StarBERT, only captures 0.61×, 0.85×, and 0.81× as many SEVs as the logistic regression for the weekend, yellow, and red gating zones, respectively. The generative LLMs, iCodeLlama-34B and iDiffLlama-13B, when risk-aligned, capture more SEVs than the logistic regression model in production: 1.40×, 1.52×, 1.05×, respectively.

This program is tentative and subject to change.

Fri 2 May

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
AI for Testing and QA 5SE In Practice (SEIP) at 214
11:00
15m
Talk
ASTER: Natural and Multi-language Unit Test Generation with LLMsAward Winner
SE In Practice (SEIP)
Rangeet Pan IBM Research, Myeongsoo Kim Georgia Institute of Technology, Rahul Krishna IBM Research, Raju Pavuluri IBM T.J. Watson Research Center, Saurabh Sinha IBM Research
Pre-print
11:15
15m
Talk
Automated Code Review In Practice
SE In Practice (SEIP)
Umut Cihan Bilkent University, Vahid Haratian Bilkent Univeristy, Arda İçöz Bilkent University, Mert Kaan Gül Beko, Ömercan Devran Beko, Emircan Furkan Bayendur Beko, Baykal Mehmet Ucar Beko, Eray Tüzün Bilkent University
11:30
15m
Talk
CI at Scale: Lean, Green, and Fast
SE In Practice (SEIP)
Dhruva Juloori Uber Technologies, Inc, Zhongpeng Lin Uber Technologies Inc., Matthew Williams Uber Technologies, Inc, Eddy Shin Uber Technologies, Inc, Sonal Mahajan Uber Technologies Inc.
11:45
15m
Talk
Moving Faster and Reducing Risk: Using LLMs in Release DeploymentAward Winner
SE In Practice (SEIP)
Rui Abreu Meta, Vijayaraghavan Murali Meta Platforms Inc., Peter C Rigby Meta / Concordia University, Chandra Sekhar Maddila Meta Platforms, Inc., Weiyan Sun Meta Platforms, Inc., Jun Ge Meta Platforms, Inc., Kaavya Chinniah Meta Platforms, Inc., Audris Mockus The University of Tennessee, Megh Mehta Meta Platforms, Inc., Nachiappan Nagappan Meta Platforms, Inc.
12:00
15m
Talk
Prioritizing Large-scale Natural Language Test Cases at OPPO
SE In Practice (SEIP)
Haoran Xu , Chen Zhi Zhejiang University, Tianyu Xiang Guangdong Oppo Mobile Telecommunications Corp., Ltd., Zixuan Wu Zhejiang University, Gaorong Zhang Zhejiang University, Xinkui Zhao Zhejiang University, Jianwei Yin Zhejiang University, Shuiguang Deng Zhejiang University; Alibaba-Zhejiang University Joint Institute of Frontier Technologies
12:15
15m
Talk
Search+LLM-based Testing for ARM Simulators
SE In Practice (SEIP)
Bobby Bruce University of California at Davis, USA, Aidan Dakhama King's College London, Karine Even-Mendoza King’s College London, William B. Langdon University College London, Hector Menendez King’s College London, Justyna Petke University College London
:
:
:
: