ICSE 2025 - Software Engineering in Practice (SEIP)

Call for papers

The Software Engineering in Practice (SEIP) track of ICSE is the premier venue for practitioners and researchers to discuss insights, innovations, and solutions to concrete software engineering problems. Practical software development relies on excellent software engineering research.

SEIP provides a unique forum for networking, exchanging ideas, fostering innovations, and forging long-term collaborations to address Software Engineering research that impacts directly on practice. It is one of the most prestigious forums in which to publish work in the Software Engineering literature. SEIP will include participants and speakers from both industrial and academic sectors, drawing together researchers and practitioners eager to communicate and share common interests in software engineering. The track will be composed of invited talks, paper presentations, and interactive sessions, all with a strong focus on software practice.

Please note that the ICSE SEIP track DOES NOT require double-anonymous review. Unlike some other ICSE tracks, it is often important for SEIP paper authors and the organizations involved to be visible to the reviewers in order for the reviewers to fully understand the industrial relevance and context.

Submissions

Link to submission: https://icse2025-seip.hotcrp.com/

Papers address industrially-relevant problems through systematic investigations. Each paper should describe a problem of practical importance, explain how the problem was investigated and in what context, and present evidence for the paper’s conclusions. The submission should have technical and empirical soundness. Other aspects that should also be included where appropriate include: discussing why the resolution of the problem is innovative, (cost-) effective, or efficient; providing a concise explanation of the approach, techniques, and methodologies employed; and explaining the insights or best practices that emerged, tools developed, and/or software processes involved.

Papers will appear in the ICSE SEIP Companion proceedings.

Submission Process

All submissions must not exceed 10 pages (for full papers) for the main text, inclusive of all figures, tables, and appendices. 2 pages are allowed for references. All submissions must be in PDF.
Submissions must strictly conform to the IEEE conference proceedings formatting instructions specified above. Alterations of spacing, font size, and other changes that deviate from the instructions may result in desk rejection without further review.
All submissions must conform to the IEEE conference proceedings template, specified in the IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTeX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf options). Submissions must strictly conform to the IEEE conference proceedings formatting instructions specified above . Alterations of spacing, font size, and other changes that deviate from the instructions may result in desk rejection without further review.
By submitting to the ICSE SEIP Track, authors acknowledge that they are aware of and agree to be bound by the ACM Policy and Procedures on Plagiarism and the IEEE Plagiarism FAQ. Papers submitted to ICSE 2025 must not have been published elsewhere and must not be under review or submitted for review elsewhere whilst under consideration for ICSE 2025. Contravention of this concurrent submission policy will be deemed a serious breach of scientific ethics, and appropriate action will be taken in all such cases. To check for double submission and plagiarism issues, the chairs reserve the right to (1) share the list of submissions with the PC Chairs of other conferences with overlapping review periods and (2) use external plagiarism detection software, under contract to the ACM or IEEE, to detect violations of these policies.
By submitting to the ICSE Research SEIP, authors acknowledge that they conform to the authorship policy of the ACM, and the authorship policy of the IEEE.
If the research involves human participants/subjects, the authors must adhere to the ACM Publications Policy on Research Involving Human Participants and Subjects. Upon submitting, authors will declare their compliance with such a policy. Alleged violations of this policy or any ACM Publications Policy will be investigated by ACM and may result in a full retraction of your paper, in addition to other potential penalties, as per ACM Publications Policy.
Please ensure that you and your co-authors obtain an ORCID ID, so you can complete the publishing process for your accepted paper. ACM and IEEE have been involved in ORCID and may collect ORCID IDs from all published authors. We are committed to improve author discoverability, ensure proper attribution and contribute to ongoing community efforts around name normalization; your ORCID ID will help in these efforts.

Evaluation

All SEIP submissions will be reviewed by members of the SEIP Program Committee. Submissions will be evaluated based on relevance to industrial practice, the significance of contribution, and the quality of presentation.

By submitting to this track, authors acknowledge that they are aware of and agree to be bound by the ACM Policy and Procedures on Plagiarism and the IEEE Plagiarism FAQ. In particular, papers submitted to ICSE 2025 SEIP must not have been published elsewhere and must not be under review or submitted for review elsewhere while under consideration for ICSE SEIP 2025. Contravention of this concurrent submission policy will be deemed a serious breach of scientific ethics, and appropriate action will be taken in all such cases. To check for double submission and plagiarism issues, the chairs reserve the right to (1) share the list of submissions with the PC Chairs of other conferences with overlapping review periods and (2) use external plagiarism detection software, under contract to the ACM or IEEE, to detect violations of these policies.

By submitting to this track, authors acknowledge that they conform to the authorship policy of the ACM, and the authorship policy of the IEEE.

Important Dates

Submission deadline: October 10th, 2024
Acceptance notification: December 15th, 2024
Camera ready: January 10, 2025

Conference Attendance Expectation

If a submission is accepted, at least one author of the paper is required to register for and attend the conference to present the paper. Virtual attendance will be an option.

Contact

If there are queries regarding the CFP, please contact the ICSE SEIP 2025 chairs.

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

You're viewing the program in a time zone which is different from your device's time zone change time zone

Wed 30 Apr
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Formal Methods 1Research Track / New Ideas and Emerging Results (NIER) at 103 Chair(s): Cristian Cadar Imperial College London

11:00 15m Talk		SpecGen: Automated Generation of Formal Program Specifications via Large Language ModelsFormal Methods Research Track Lezhi Ma Nanjing University, Shangqing Liu Nanyang Technological University, Yi Li Nanyang Technological University, Xiaofei Xie Singapore Management University, Lei Bu Nanjing University
11:15 15m Talk		Gpass: a Goal-adaptive Neural Theorem Prover based on Coq for Automated Formal VerificationFormal Methods Research Track Yizhou Chen Peking University, Zeyu Sun Institute of Software, Chinese Academy of Sciences, Guoqing Wang Peking University, Dan Hao Peking University
11:30 15m Talk		AI-Assisted Autoformalization of Combinatorics Problems in Proof AssistantsFormal Methods New Ideas and Emerging Results (NIER) Long Doan George Mason University, ThanhVu Nguyen George Mason University
11:45 15m Talk		Formally Verified Binary-level Pointer AnalysisFormal Methods Research Track Freek Verbeek Open Universiteit & Virginia Tech, Ali Shokri Virginia Tech, Daniel Engel Open University Of The Netherlands, Binoy Ravindran Virginia Tech
12:00 15m Talk		EffBT: An Efficient Behavior Tree Reactive Synthesis and Execution FrameworkFormal Methods Research Track ziji wu National University of Defense Technology, yu huang National University of Defense Technology, peishan huang National University of Defense Technology, shanghua wen National University of Defense Technology, minglong li National University of Defense Technology, Ji Wang National University of Defense Technology
12:15 7m Talk		SolSearch: An LLM-Driven Framework for Efficient SAT-Solving Code GenerationFormal Methods New Ideas and Emerging Results (NIER) Junjie Sheng East China Normal University, Yanqiu Lin East China Normal University, Jiehao Wu East China Normal University, Yanhong Huang East China Normal University, Jianqi Shi East China Normal University, Min Zhang East China Normal University, Xiangfeng Wang East China Normal University
12:22 7m Talk		Listening to the Firehose: Sonifying Z3’s BehaviorFormal Methods New Ideas and Emerging Results (NIER) Finn Hackett University of British Columbia, Ivan Beschastnikh University of British Columbia

11:00 - 12:30	Program Comprehension 1Research Track at 204 Chair(s): Wing Lam George Mason University

11:00 15m Talk		An Empirical Study on Package-Level Deprecation in Python Ecosystem Research Track Zhiqing Zhong The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shilin He Microsoft Research, Haoxuan Wang The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), BoXi Yu The Chinese University of Hong Kong, Shenzhen, Haowen Yang The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Pinjia He Chinese University of Hong Kong, Shenzhen
11:15 15m Talk		Datalog-Based Language-Agnostic Change Impact Analysis for Microservices Research Track Qingkai Shi Nanjing University, Xiaoheng Xie Ant Group, Xianjin Fu Ant Group, Peng Di Ant Group & UNSW Sydney, Huawei Li Alibaba Inc., Ang Zhou Ant Group, Gang Fan Ant Group
11:30 15m Talk		GenC2Rust: Towards Generating Generic Rust Code from C Research Track Xiafa Wu University of California, Irvine, Brian Demsky University of California at Irvine
11:45 15m Talk		Instrumentation-Driven Evolution-Aware Runtime Verification Research Track Kevin Guan Cornell University, Owolabi Legunsen Cornell University
12:00 15m Talk		Moye: A Wallbreaker for Monolithic Firmware Research Track Jintao Huang Institute of Information Engineering, Chinese Academy of Science & University of Chinese Academy of Sciences, Beijing, China, Kai Yang School of Computer, Electronics and Information, Guangxi University, Gaosheng Wang Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China, Zhiqiang Shi Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China, Zhiwen Pan Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China, Shichao Lv Institute of Information Engineering, Chinese Academy of Science, Limin Sun Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China
12:15 15m Talk		Understanding and Detecting Peer Dependency Resolving Loop in npm Ecosystem Research Track Xingyu Wang Zhejiang University, MingSen Wang Zhejiang University, Wenbo Shen Zhejiang University, Rui Chang Zhejiang University

11:00 - 12:30	Testing and QA 1Research Track / Journal-first Papers at 205 Chair(s): Jonathan Bell Northeastern University

11:00 15m Talk		Critical Variable State-Aware Directed Greybox Fuzzing Research Track Xu Chen Institute of Information Engineering at Chinese Academy of Sciences, China / University of Chinese Academy of Sciences, China, Ningning Cui Institute of Information Engineering at Chinese Academy of Sciences, China / University of Chinese Academy of Sciences, China, Zhe Pan Institute of Information Engineering at Chinese Academy of Sciences, China / University of Chinese Academy of Sciences, China, Liwei Chen Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Gang Shi Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Dan Meng Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences
11:15 15m Talk		LWDIFF: An LLM-Assisted Differential Testing Framework for WebAssembly Runtimes Research Track Shiyao Zhou The Hong Kong Polytechnic University, Jincheng Wang Hong Kong Polytechnic University, He Ye University College London (UCL), Hao Zhou The Hong Kong Polytechnic University, Claire Le Goues Carnegie Mellon University, Xiapu Luo Hong Kong Polytechnic University
11:30 15m Talk		No Harness, No Problem: Oracle-guided Harnessing for Auto-generating C API Fuzzing Harnesses Research Track Gabriel Sherman University of Utah, Stefan Nagy University of Utah
11:45 15m Talk		Parametric Falsification of Many Probabilistic Requirements under Flakiness Research Track Matteo Camilli Politecnico di Milano, Raffaela Mirandola Karlsruhe Institute of Technology (KIT)
12:00 15m Talk		REDII: Test Infrastructure to Enable Deterministic Reproduction of Failures for Distributed Systems Research Track Yang Feng Nanjing University, Zheyuan Lin Nanjing University, Dongchen Zhao Nanjing University, Mengbo Zhou Nanjing University, Jia Liu Nanjing University, James Jones University of California at Irvine
12:15 15m Talk		Adopting Automated Bug Assignment in Practice - A Longitudinal Case Study at Ericsson Journal-first Papers Markus Borg CodeScene, Leif Jonsson Ericsson AB, Emelie Engstrom Lund University, Béla Bartalos Verint, Attila Szabo Ericsson

11:00 - 12:30	Gender, Equity and DiversitySE in Society (SEIS) / Research Track at 206 plus 208 Chair(s): Ronnie de Souza Santos University of Calgary

11:00 15m Talk		A Socio-Technical Grounded Theory on the Effect of Cognitive Dysfunctions in the Performance of Software Developers with ADHD and Autism SE in Society (SEIS) Kiev Gama Universidade Federal de Pernambuco, Grischa Liebel Reykjavik University, Miguel Goulao NOVA-LINCS, FCT/UNL, Aline Lacerda Federal University of Pernambuco (UFPE), Cristiana Lacerda Universidade Federal de Pernambuco Pre-print
11:15 15m Talk		Belonging Beyond Code: Queer Software Engineering and Humanities Student Experiences SE in Society (SEIS) Emily Vorderwülbeke University of Passau, Isabella Graßl Technical University of Darmstadt Pre-print
11:30 15m Talk		Breaking the Silos: An Actionable Framework for Recruiting Diverse Participants in SE SE in Society (SEIS) Shandler Mason North Carolina State University, Hank Lenham North Carolina State University, Sandeep Kuttal North Carolina State University
11:45 15m Talk		Enhancing Women's Experiences in Software Engineering SE in Society (SEIS) Júlia Rocha Fortunato University of Brasília, Luana Ribeiro Soares University of Brasília, Gabriela Silva Alves University of Brasília, Edna Dias Canedo University of Brasilia (UnB), Fabiana Freitas Mendes Aalto University
12:00 15m Talk		Investigating the Developer eXperience of LGBTQIAPN+ People in Agile Teams SE in Society (SEIS) Edvaldo R. Wassouf-Jr UFMS, Pedro Fukuda Federal University of Mato Grosso do Sul, Awdren Fontão Federal University of Mato Grosso do Sul (UFMS)
12:15 15m Talk		There's Nothing to See Here: A Study of Deaf and Hearing Developer Use of Stack Overflow SE in Society (SEIS) Steve Counsell Brunel University London, Giuseppe Destefanis Brunel University of London, Rumyana Neykova Brunel University London, Alina Miron Brunel University, Nadine Aburumman Brunel University, Thomas Shippey LogicMonitor

11:00 - 12:30	Human and Social Process 1SE In Practice (SEIP) / New Ideas and Emerging Results (NIER) / Journal-first Papers / Research Track at 207 Chair(s): Hausi Müller University of Victoria

11:00 15m Talk		Toward a Theory on Programmer's Block Inspired by Writer's Block Journal-first Papers Belinda Schantong Chemnitz University of Technology, Norbert Siegmund Leipzig University, Janet Siegmund Chemnitz University of Technology Link to publication
11:15 15m Talk		Digital Twins for Software Engineering Processes New Ideas and Emerging Results (NIER) Robin Kimmel University of Stuttgart, Judith Michael University of Regensburg, Andreas Wortmann University of Stuttgart, Jingxi Zhang University of Stuttgart Pre-print
11:30 15m Talk		Discovering Ideologies of the Open Source Software Movement New Ideas and Emerging Results (NIER) Yang Yue California State University San Marcos, Yi Wang Beijing University of Posts and Telecommunications, David Redmiles University of California, Irvine
11:45 15m Talk		Identifying Factors Contributing to ``Bad Days'' for Software Developers: A Mixed-Methods Study SE In Practice (SEIP) Ike Obi Purdue University, West Lafayette, Jenna L. Butler Microsoft Research, Sankeerti Haniyur Microsoft Corporation, Brian Hassan Microsoft Corporation, Margaret-Anne Storey University of Victoria, Brendan Murphy Microsoft Corporation
12:00 15m Talk		Time Warp: The Gap Between Developers’ Ideal vs Actual Workweeks in an AI-Driven EraAward Winner SE In Practice (SEIP) Sukrit Kumar Georgia Institute of Technology, Drishti Goel Microsoft, Thomas Zimmermann University of California, Irvine, Brian Houck Microsoft Research, B. Ashok Microsoft Research. India, Chetan Bansal Microsoft Research
12:15 15m Talk		Wearables to measure developer experience at work SE In Practice (SEIP) Charlotte Brandebusemeyer Hasso Plattner Institute, University of Potsdam, Tobias Schimmer SAP Labs, Bert Arnrich Hasso Plattner Institute, University of Potsdam

11:00 - 12:30	AI for User ExperienceSE In Practice (SEIP) / Demonstrations / Journal-first Papers / Research Track at 210 Chair(s): Chunyang Chen TU Munich

11:00 15m Talk		Automated Generation of Accessibility Test Reports from Recorded User TranscriptsAward Winner Research Track Syed Fatiul Huq University of California, Irvine, Mahan Tafreshipour University of California at Irvine, Kate Kalcevich Fable Tech Labs Inc., Sam Malek University of California at Irvine
11:15 15m Talk		KuiTest: Leveraging Knowledge in the Wild as GUI Testing Oracle for Mobile Apps SE In Practice (SEIP) Yongxiang Hu Fudan University, Yu Zhang Meituan, Xuan Wang Fudan University, Yingjie Liu School of Computer Science, Fudan University, Shiyu Guo Meituan, Chaoyi Chen Meituan, Xin Wang Fudan University, Yangfan Zhou Fudan University
11:30 15m Talk		GUIWatcher: Automatically Detecting GUI Lags by Analyzing Mobile Application Screencasts SE In Practice (SEIP) Wei Liu Concordia University, Montreal, Canada, Feng Lin Concordia University, Linqiang Guo Concordia University, Tse-Hsun (Peter) Chen Concordia University, Ahmed E. Hassan Queen’s University
11:45 15m Talk		GUIDE: LLM-Driven GUI Generation Decomposition for Automated Prototyping Demonstrations Kristian Kolthoff Institute for Software and Systems Engineering, Clausthal University of Technology, Felix Kretzer human-centered systems Lab (h-lab), Karlsruhe Institute of Technology (KIT) , Christian Bartelt , Alexander Maedche Human-Centered Systems Lab, Karlsruhe Institute of Technology, Simone Paolo Ponzetto Data and Web Science Group, University of Mannheim Pre-print
12:00 15m Talk		Agent for User: Testing Multi-User Interactive Features in TikTok SE In Practice (SEIP) Sidong Feng Monash University, Changhao Du Jilin University, huaxiao liu Jilin University, Qingnan Wang Jilin University, Zhengwei Lv ByteDance, Gang Huo ByteDance, Xu Yang ByteDance, Chunyang Chen TU Munich
12:15 7m Talk		Bug Analysis in Jupyter Notebook Projects: An Empirical Study Journal-first Papers Taijara Santana Federal University of Bahia, Paulo Silveira Neto Federal University Rural of Pernambuco, Eduardo Santana de Almeida Federal University of Bahia, Iftekhar Ahmed University of California at Irvine

11:00 - 12:30	Testing and SecurityResearch Track / Journal-first Papers at 211 Chair(s): Shiyi Wei University of Texas at Dallas

11:00 15m Talk		Fuzzing MLIR Compilers with Custom Mutation Synthesis Research Track Ben Limpanukorn UCLA, Jiyuan Wang University of California at Los Angeles, Hong Jin Kang University of Sydney, Eric Zitong Zhou UCLA, Miryung Kim UCLA and Amazon Web Services Pre-print
11:15 15m Talk		InSVDF: Interface-State-Aware Virtual Device Fuzzing Research Track Zexiang Zhang National University of Defense Technology, Gaoning Pan Hangzhou Dianzi University, Ruipeng Wang National University of Defense Technology, Yiming Tao Zhejiang University, Zulie Pan National University of Defense Technology, Cheng Tu National University of Defense Technology, Min Zhang National University of Defense Technology, Yang Li National University of Defense Technology, Yi Shen National University of Defense Technology, Chunming Wu Zhejiang University
11:30 15m Talk		Reduce Dependence for Sound Concurrency Bug Prediction Research Track Shihao Zhu State Key Laboratory of Computer Science,Institute of Software,Chinese Academy of Sciences,China, Yuqi Guo Institute of Software, Chinese Academy of Sciences, Yan Cai Institute of Software at Chinese Academy of Sciences, Bin Liang Renmin University of China, Long Zhang Institute of Software, Chinese Academy of Sciences, Rui Chen Beijing Institute of Control Engineering; Beijing Sunwise Information Technology, Tingting Yu Beijing Institute of Control Engineering; Beijing Sunwise Information Technology
11:45 15m Talk		SAND: Decoupling Sanitization from Fuzzing for Low Overhead Research Track Ziqiao Kong Nanyang Technological University, Shaohua Li The Chinese University of Hong Kong, Heqing Huang City University of Hong Kong, Zhendong Su ETH Zurich Link to publication Pre-print Media Attached File Attached
12:00 15m Talk		TransferFuzz: Fuzzing with Historical Trace for Verifying Propagated Vulnerability CodeSecurity Research Track Siyuan Li University of Chinese Academy of Sciences & Institute of Information Engineering Chinese Academy of Sciences, China, Yuekang Li UNSW, Zuxin Chen Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China, Chaopeng Dong Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China, Yongpan Wang University of Chinese Academy of Sciences & Institute of Information Engineering Chinese Academy of Sciences, China, Hong Li Institute of Information Engineering at Chinese Academy of Sciences, Yongle Chen Taiyuan University of Technology, China, Hongsong Zhu Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences
12:15 15m Talk		Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?Security Journal-first Papers Emanuele Iannone Hamburg University of Technology, Giulia Sellitto University of Salerno, Emanuele Iaccarino University of Salerno, Filomena Ferrucci Università di Salerno, Andrea De Lucia University of Salerno, Fabio Palomba University of Salerno Link to publication DOI Authorizer link Pre-print

11:00 - 12:30	AI for Analysis 1Research Track at 212 Chair(s): Denys Poshyvanyk William & Mary

11:00 15m Talk		A Multiple Representation Transformer with Optimized Abstract Syntax Tree for Efficient Code Clone Detection Research Track TianChen Yu School of Software Engineering, South China University of Technology, Li Yuan School of Software Engineering, South China University of Technology, Guangzhou, China, Liannan Lin School of Software Engineering, South China University of Technology, Hongkui He School of Software Engineering, South China University of Technology
11:15 15m Talk		Can an LLM find its way around a Spreadsheet? Research Track Cho-Ting Lee Virginia Tech, Andrew Neeser Virginia Tech, Shengzhe Xu Virginia Tech, Jay Katyan Virginia Tech, Patrick Cross Virginia Tech, Sharanya Pathakota Virginia Tech, Marigold Norman World Forest ID, John C. Simeone Simeone Consulting, LLC, Jaganmohan Chandrasekaran Virginia Tech, Naren Ramakrishnan Virginia Tech
11:30 15m Talk		QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning Research Track Alex Sanchez-Stern University of Massachusetts at Amherst, Abhishek Varghese University of Massachusetts, Zhanna Kaufman University of Massachusetts, Shizhuo Zhang University of Illinois Urbana-Champaign, Talia Lily Ringer University of Illinois Urbana-Champaign, Yuriy Brun University of Massachusetts Link to publication Pre-print
11:45 15m Talk		TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference Research Track Chong Wang Nanyang Technological University, Jian Zhang Nanyang Technological University, Yiling Lou Fudan University, Mingwei Liu Fudan University, Weisong Sun Nanyang Technological University, Yang Liu Nanyang Technological University, Xin Peng Fudan University
12:00 15m Talk		ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation Research Track Xue Jiang , Yihong Dong Peking University, Yongding Tao University of Electronic Science and Technology of China, Huanyu Liu Xidian University, Zhi Jin Peking University, Ge Li Peking University
12:15 15m Talk		Rango: Adaptive Retrieval-Augmented Proving for Automated Software VerificationAward Winner Research Track Kyle Thompson University of California, San Diego, Nuno Saavedra INESC-ID and IST, University of Lisbon, Pedro Carrott Imperial College London, Kevin Fisher University of California San Diego, Alex Sanchez-Stern University of Massachusetts, Yuriy Brun University of Massachusetts, João F. Ferreira INESC-ID and IST, University of Lisbon, Sorin Lerner University of California at San Diego, Emily First University of California, San Diego Link to publication Pre-print File Attached

11:00 - 12:30	AutonomyResearch Track at 213 Chair(s): Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland

11:00 15m Talk		A Differential Testing Framework to Identify Critical AV Failures Leveraging Arbitrary Inputs Research Track Trey Woodlief University of Virginia, Carl Hildebrandt University of Virginia, Sebastian Elbaum University of Virginia
11:15 15m Talk		Automating a Complete Software Test Process Using LLMs: An Automotive Case Study Research Track Shuai Wang , Yinan Yu Chalmers University of Technology, Robert Feldt Chalmers \| University of Gothenburg, Dhasarathy Parthasarathy Volvo Group Pre-print
11:30 15m Talk		LLM-Agents Driven Automated Simulation Testing and Analysis of small Uncrewed Aerial Systems Research Track Venkata Sai Aswath Duvvuru Saint Louis University, Bohan Zhang Saint Louis University, Missouri, Michael Vierhauser University of Innsbruck, Ankit Agrawal Saint Louis University, Missouri Pre-print Media Attached
11:45 15m Talk		Efficient Domain Augmentation for Autonomous Driving Testing Using Diffusion Models Research Track Luciano Baresi Politecnico di Milano, Davide Yi Xian Hu Politecnico di Milano, Andrea Stocco Technical University of Munich, fortiss, Paolo Tonella USI Lugano Pre-print
12:00 15m Talk		GARL: Genetic Algorithm-Augmented Reinforcement Learning to Detect Violations in Marker-Based Autonomous Landing Systems Research Track Linfeng Liang Macquarie University, Yao Deng Macquarie University, Kye Morton Skyy Network, Valtteri Kallinen Skyy Network, Alice James Macquarie University, Avishkar Seth Macquarie University, Endrowednes Kuantama Macquarie University, Subhas Mukhopadhyay Macquarie University, Richard Han Macquarie University, Xi Zheng Macquarie University
12:15 15m Talk		Decictor: Towards Evaluating the Robustness of Decision-Making in Autonomous Driving Systems Research Track Mingfei Cheng Singapore Management University, Xiaofei Xie Singapore Management University, Yuan Zhou Zhejiang Sci-Tech University, Junjie Wang Tianjin University, Guozhu Meng Institute of Information Engineering, Chinese Academy of Sciences, Kairui Yang DAMO Academy, Alibaba Group, China

11:00 - 12:30	AI for Testing and QA 1Research Track / SE In Practice (SEIP) at 214 Chair(s): Jieshan Chen CSIRO's Data61

11:00 15m Talk		Does GenAI Make Usability Testing Obsolete?Award Winner Research Track Ali Ebrahimi Pourasad , Walid Maalej University of Hamburg Pre-print
11:15 15m Talk		Feature-Driven End-To-End Test Generation Research Track Parsa Alian University of British Columbia, Noor Nashid University of British Columbia, Mobina Shahbandeh University of British Columbia, Taha Shabani University of British Columbia, Ali Mesbah University of British Columbia
11:30 15m Talk		SeeAction: Towards Reverse Engineering How-What-Where of HCI Actions from Screencasts for UI AutomationAward Winner Research Track Dehai Zhao CSIRO's Data61, Zhenchang Xing CSIRO's Data61, Qinghua Lu Data61, CSIRO, Xiwei (Sherry) Xu Data61, CSIRO, Liming Zhu CSIRO’s Data61
11:45 15m Talk		Synthesizing Document Database Queries using Collection Abstractions Research Track Qikang Liu Simon Fraser University, Yang He Simon Fraser University, Yanwen Cai Simon Fraser University, Byeongguk Kwak Simon Fraser University, Yuepeng Wang Simon Fraser University
12:00 15m Talk		The Power of Types: Exploring the Impact of Type Checking on Neural Bug Detection in Dynamically Typed Languages Research Track Boqi Chen McGill University, José Antonio Hernández López Linköping University, Gunter Mussbacher McGill University, Daniel Varro Linköping University / McGill University
12:15 15m Talk		DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production SE In Practice (SEIP) Xiaoyun Liang ByteDance, Jingyi Ren ByteDance, Jiayi Qi ByteDance, Chao Peng ByteDance, Bo Jiang Bytedance Network Technology

11:00 - 12:30	SE for AI 1New Ideas and Emerging Results (NIER) / SE In Practice (SEIP) / Research Track at 215 Chair(s): Houari Sahraoui DIRO, Université de Montréal

11:00 15m Talk		A Test Oracle for Reinforcement Learning Software based on Lyapunov Stability Control TheorySE for AIAward Winner Research Track Shiyu Zhang The Hong Kong Polytechnic University, Haoyang Song The Hong Kong Polytechnic University, Qixin Wang The Hong Kong Polytechnic University, Henghua Shen The Hong Kong Polytechnic University, Yu Pei The Hong Kong Polytechnic University
11:15 15m Talk		CodeImprove: Program Adaptation for Deep Code ModelsSE for AI Research Track Ravishka Shemal Rathnasuriya University of Texas at Dallas, zijie zhao , Wei Yang UT Dallas
11:30 15m Talk		FairQuant: Certifying and Quantifying Fairness of Deep Neural NetworksSE for AI Research Track Brian Hyeongseok Kim University of Southern California, Jingbo Wang University of Southern California, Chao Wang University of Southern California Pre-print
11:45 15m Talk		When in Doubt Throw It out: Building on Confident Learning for Vulnerability DetectionSecuritySE for AI New Ideas and Emerging Results (NIER) Yuanjun Gong Renmin University of China, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam Pre-print File Attached
12:00 15m Talk		Evaluation of Tools and Frameworks for Machine Learning Model ServingSE for AI SE In Practice (SEIP) Niklas Beck Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, Benny Stein Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, Dennis Wegener T-Systems International GmbH, Lennard Helmer Fraunhofer Institute for Intelligent Analysis and Information Systems
12:15 15m Talk		Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation ModelsSE for AI SE In Practice (SEIP) Kirill Vasilevski Huawei Canada, Dayi Lin Centre for Software Excellence, Huawei Canada, Ahmed E. Hassan Queen’s University Pre-print File Attached

11:00 - 12:30	AI for SE 1Research Track at Canada Hall 1 and 2 Chair(s): Tao Chen University of Birmingham

11:00 15m Talk		Calibration and Correctness of Language Models for Code Research Track Claudio Spiess University of California, Davis, David Gros University of California, Davis, Kunal Suresh Pai UC Davis, Michael Pradel University of Stuttgart, Rafiqul Rabin UL Research Institutes, Amin Alipour University of Houston, Susmit Jha SRI, Prem Devanbu University of California at Davis, Toufique Ahmed IBM Research Pre-print
11:15 15m Talk		An Empirical Study on Commit Message Generation using LLMs via In-Context Learning Research Track Yifan Wu Peking University, Yunpeng Wang Ant Group, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Wei Tao Independent Researcher, Siyu Yu The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Haowen Yang The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Wei Jiang , Jianguo Li Ant Group Pre-print
11:30 15m Talk		Instruct or Interact? Exploring and Eliciting LLMs’ Capability in Code Snippet Adaptation Through Prompt Engineering Research Track Tanghaoran Zhang National University of Defense Technology, Yue Yu PengCheng Lab, Xinjun Mao National University of Defense Technology, Shangwen Wang National University of Defense Technology, Kang Yang National University of Defense Technology, Yao Lu National University of Defense Technology, Zhang Zhang Key Laboratory of Software Engineering for Complex Systems, National University of Defense Technology, Yuxin Zhao Key Laboratory of Software Engineering for Complex Systems, National University of Defense Technology
11:45 15m Talk		Search-Based LLMs for Code OptimizationAward Winner Research Track Shuzheng Gao The Chinese University of Hong Kong, Cuiyun Gao Harbin Institute of Technology, Wenchao Gu The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong
12:00 15m Talk		Towards Better Answers: Automated Stack Overflow Post Updating Research Track Yubo Mai Zhejiang University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Haoye Wang Hangzhou City University, Tingting Bi The University of Melbourne, Xing Hu Zhejiang University, Xin Xia Huawei, JianLing Sun Zhejiang University
12:15 15m Talk		Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the FamiliarAward Winner Research Track Yuanliang Zhang National University of Defense Technology, Yifan Xie , Shanshan Li National University of Defense Technology, Ke Liu , Chong Wang National University of Defense Technology, Zhouyang Jia National University of Defense Technology, Xiangbing Huang National University of Defense Technology, Jie Song National University of Defense Technology, Chaopeng Luo National University of Defense Technology, Zhizheng Zheng National University of Defense Technology, Rulin Xu National University of Defense Technology, Yitong Liu National University of Defense Technology, Si Zheng National University of Defense Technology, Liao Xiangke National University of Defense Technology

13:30 - 14:00	Wed Lunch Posters 13:30-14:00Research Track / Journal-first Papers / New Ideas and Emerging Results (NIER) / Posters at Canada Hall 3 Poster Area

13:30 30m Poster		Pattern-based Generation and Adaptation of Quantum WorkflowsQuantum Research Track Martin Beisel Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Johanna Barzen University of Stuttgart, Frank Leymann University of Stuttgart, Lavinia Stiliadou Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Daniel Vietz University of Stuttgart, Benjamin Weder Institute of Architecture of Application Systems (IAAS), University of Stuttgart
13:30 30m Talk		Mole: Efficient Crash Reproduction in Android Applications With Enforcing Necessary UI Events Journal-first Papers Maryam Masoudian Sharif University of Technology, Hong Kong University of Science and Technology (HKUST), Heqing Huang City University of Hong Kong, Morteza Amini Sharif University of Technology, Charles Zhang Hong Kong University of Science and Technology
13:30 30m Talk		Automated Testing Linguistic Capabilities of NLP Models Journal-first Papers Jaeseong Lee The University of Texas at Dallas, Simin Chen University of Texas at Dallas, Austin Mordahl University of Illinois Chicago, Cong Liu University of California, Riverside, Wei Yang UT Dallas, Shiyi Wei University of Texas at Dallas
13:30 30m Poster		BSan: A Powerful Identifier-Based Hardware-Independent Memory Error Detector for COTS Binaries Research Track Wen Zhang University of Georgia, Botang Xiao University of Georgia, Qingchen Kong University of Georgia, Le Guan University of Georgia, Wenwen Wang University of Georgia
13:30 30m Talk		A Unit Proofing Framework for Code-level Verification: A Research AgendaFormal Methods New Ideas and Emerging Results (NIER) Paschal Amusuo Purdue University, Parth Vinod Patil Purdue University, Owen Cochell Michigan State University, Taylor Le Lievre Purdue University, James C. Davis Purdue University Pre-print
13:30 30m Talk		Listening to the Firehose: Sonifying Z3’s BehaviorFormal Methods New Ideas and Emerging Results (NIER) Finn Hackett University of British Columbia, Ivan Beschastnikh University of British Columbia
13:30 30m Talk		Towards Early Warning and Migration of High-Risk Dormant Open-Source Software DependenciesSecurity New Ideas and Emerging Results (NIER) Zijie Huang Shanghai Key Laboratory of Computer Software Testing and Evaluation, Lizhi Cai Shanghai Key Laboratory of Computer Software Testing & Evaluating, Shanghai Software Center, Xuan Mao Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China, Kang Yang Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai Development Center of Computer Software Technology
13:30 30m Poster		SimClone: Detecting Tabular Data Clones using Value Similarity Journal-first Papers Xu Yang University of Manitoba, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Dayi Lin Centre for Software Excellence, Huawei Canada, Shaowei Wang University of Manitoba, Zhen Ming (Jack) Jiang York University
13:30 30m Talk		SolSearch: An LLM-Driven Framework for Efficient SAT-Solving Code GenerationFormal Methods New Ideas and Emerging Results (NIER) Junjie Sheng East China Normal University, Yanqiu Lin East China Normal University, Jiehao Wu East China Normal University, Yanhong Huang East China Normal University, Jianqi Shi East China Normal University, Min Zhang East China Normal University, Xiangfeng Wang East China Normal University

15:30 - 16:00	Wed Afternoon Break Posters 15:30-16:00Journal-first Papers / SE In Practice (SEIP) / Research Track / Posters at Canada Hall 3 Poster Area

15:30 30m Poster		Non-Autoregressive Line-Level Code Completion Journal-first Papers Fang Liu Beihang University, Zhiyi Fu Peking University, Ge Li Peking University, Zhi Jin Peking University, Hui Liu Beijing Institute of Technology, Yiyang Hao Silicon Heart Tech Co., Li Zhang Beihang University
15:30 30m Poster		FlatD: Protecting Deep Neural Network Program from Reversing Attacks SE In Practice (SEIP) Jinquan Zhang The Pennsylvania State University, Zihao Wang Penn State University, Pei Wang Independent Researcher, Rui Zhong Palo Alto Networks, Dinghao Wu Pennsylvania State University
15:30 30m Talk		Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-PracticeSE for AI Journal-first Papers Bentley Oakes Polytechnique Montréal, Michalis Famelis Université de Montréal, Houari Sahraoui DIRO, Université de Montréal DOI Pre-print File Attached
15:30 30m Poster		Predicting the First Response Latency of Maintainers and Contributors in Pull Requests Journal-first Papers SayedHassan Khatoonabadi Concordia University, Montreal, Ahmad Abdellatif University of Calgary, Diego Elias Costa Concordia University, Canada, Emad Shihab Concordia University, Montreal
15:30 30m Talk		LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation Journal-first Papers Sarah Fakhoury Microsoft Research, Aaditya Naik University of Pennsylvania, Georgios Sakkas University of California at San Diego, Saikat Chakraborty Microsoft Research, Shuvendu K. Lahiri Microsoft Research Link to publication
15:30 30m Poster		RustAssistant: Using LLMs to Fix Compilation Errors in Rust Code Research Track Pantazis Deligiannis Microsoft Research, Akash Lal Microsoft Research, Nikita Mehrotra Microsoft Research, Rishi Poddar Microsoft Research, Aseem Rastogi Microsoft Research
15:30 30m Talk		QuanTest: Entanglement-Guided Testing of Quantum Neural Network SystemsQuantum Journal-first Papers Jinjing Shi Central South University, Zimeng Xiao Central South University, Heyuan Shi Central South University, Yu Jiang Tsinghua University, Xuelong LI China Telecom Link to publication

16:00 - 17:30	Formal Methods 2Research Track / New Ideas and Emerging Results (NIER) / Journal-first Papers at 103 Chair(s): Yi Li Nanyang Technological University

16:00 15m Talk		ConsCS: Effective and Efficient Verification of Circom CircuitsFormal Methods Research Track Jinan Jiang The Hong Kong Polytechnic University, Xinghao Peng , Jinzhao Chu The Hong Kong Polytechnic University, Xiapu Luo Hong Kong Polytechnic University
16:15 15m Talk		Constrained LTL Specification Learning from ExamplesFormal Methods Research Track Changjian Zhang Carnegie Mellon University, Parv Kapoor Carnegie Mellon University, Ian Dardik Carnegie Mellon University, Leyi Cui Columbia University, Romulo Meira-Goes The Pennsylvania State University, David Garlan Carnegie Mellon University, Eunsuk Kang Carnegie Mellon University
16:30 15m Talk		LLM-aided Automatic Modeling for Security Protocol VerificationSecurityFormal Methods Research Track Ziyu Mao Zhejiang University, Jingyi Wang Zhejiang University, Jun Sun Singapore Management University, Shengchao Qin Xidian University, Jiawen Xiong East China Normal University
16:45 15m Talk		Model Assisted Refinement of Metamorphic Relations for Scientific SoftwareFormal Methods New Ideas and Emerging Results (NIER) Clay Stevens Iowa State University, Katherine Kjeer Iowa State University, Ryan Richard Iowa State University, Edward Valeev Virginia Tech, Myra Cohen Iowa State University
17:00 15m Talk		Precisely Extracting Complex Variable Values from Android AppsFormal Methods Journal-first Papers Marc Miltenberger Fraunhofer SIT; ATHENE, Steven Arzt Fraunhofer SIT; ATHENE
17:15 7m Talk		A Unit Proofing Framework for Code-level Verification: A Research AgendaFormal Methods New Ideas and Emerging Results (NIER) Paschal Amusuo Purdue University, Parth Vinod Patil Purdue University, Owen Cochell Michigan State University, Taylor Le Lievre Purdue University, James C. Davis Purdue University Pre-print
17:22 7m Talk		Automated Testing Linguistic Capabilities of NLP Models Journal-first Papers Jaeseong Lee The University of Texas at Dallas, Simin Chen University of Texas at Dallas, Austin Mordahl University of Illinois Chicago, Cong Liu University of California, Riverside, Wei Yang UT Dallas, Shiyi Wei University of Texas at Dallas

16:00 - 17:30	Databases and BusinessResearch Track / SE In Practice (SEIP) / Demonstrations / Journal-first Papers at 104 Chair(s): Lu Xiao Stevens Institute of Technology

16:00 15m Talk		Optimization of Automated and Manual Software Tests in Industrial Practice: A Survey and Historical Analysis Journal-first Papers Roman Haas Saarland University; CQSE, Raphael Nömmer Saarbr�cken Graduate School of Computer Science, CQSE, Elmar Juergens CQSE GmbH, Sven Apel Saarland University Link to publication Pre-print
16:15 15m Talk		A-COBREX : A Tool for Identifying Business Rules in COBOL Programs Demonstrations Samveg Shah Indian Institute of Technology, Tirupati, Shivali Agarwal IBM, Saravanan Krishnan IBM India Research Lab, Vini Kanvar IBM Research, Sridhar Chimalakonda Indian Institute of Technology Tirupati
16:30 15m Talk		Thanos: DBMS Bug Detection via Storage Engine Rotation Based Differential TestingAward Winner Research Track Ying Fu National University of Defense Technology, Zhiyong Wu Tsinghua University, China, Yuanliang Zhang National University of Defense Technology, Jie Liang , Jingzhou Fu School of Software, Tsinghua University, Yu Jiang Tsinghua University, Shanshan Li National University of Defense Technology, Liao Xiangke National University of Defense Technology
16:45 15m Talk		Coni: Detecting Database Connector Bugs via State-Aware Test Case Generation Research Track Wenqian Deng Tsinghua University, Zhiyong Wu Tsinghua University, China, Jie Liang , Jingzhou Fu School of Software, Tsinghua University, Mingzhe Wang Tsinghua University, Yu Jiang Tsinghua University
17:00 15m Talk		Puppy: Finding Performance Degradation Bugs in DBMSs via Limited-Optimization Plan Construction Research Track Zhiyong Wu Tsinghua University, China, Jie Liang , Jingzhou Fu School of Software, Tsinghua University, Mingzhe Wang Tsinghua University, Yu Jiang Tsinghua University
17:15 15m Talk		Safe Validation of Pricing Agreements SE In Practice (SEIP) John C. Kolesar Yale University, Tancrède Lepoint Amazon, Martin Schäf Amazon Web Services, Willem Visser Amazon Web Services

16:00 - 17:30	Program Comprehension 2Journal-first Papers / Research Track at 204 Chair(s): Xiaoxue Ren Zhejiang University

16:00 15m Talk		Enhancing Fault Localization in Industrial Software Systems via Contrastive Learning Research Track Chun Li Nanjing University, Hui Li Samsung Electronics (China) R&D Centre, Zhong Li , Minxue Pan Nanjing University, Xuandong Li Nanjing University
16:15 15m Talk		On the Understandability of MLOps System Architectures Journal-first Papers Stephen John Warnett University of Vienna, Uwe Zdun University of Vienna Link to publication DOI
16:30 15m Talk		Bridging the Language Gap: An Empirical Study of Bindings for Open Source Machine Learning Libraries Across Software Package Ecosystems Journal-first Papers Hao Li Queen's University, Cor-Paul Bezemer University of Alberta Link to publication DOI Pre-print
16:45 15m Talk		Understanding Code Understandability Improvements in Code Reviews Journal-first Papers Delano Hélio Oliveira , Reydne Bruno dos Santos UFPE, Benedito Fernando Albuquerque de Oliveira Federal University of Pernambuco, Martin Monperrus KTH Royal Institute of Technology, Fernando Castor University of Twente, Fernanda Madeiral Universidade Federal de Pernambuco
17:00 15m Talk		Automatic Commit Message Generation: A Critical Review and Directions for Future Work Journal-first Papers Yuxia Zhang Beijing Institute of Technology, Zhiqing Qiu Beijing Institute of Technology, Klaas-Jan Stol Lero; University College Cork; SINTEF Digital , Wenhui Zhu Beijing Institute of Technology, Jiaxin Zhu Institute of Software at Chinese Academy of Sciences, Yingchen Tian Tmall Technology Co., Hui Liu Beijing Institute of Technology
17:15 7m Talk		Efficient Management of Containers for Software Defined Vehicles Journal-first Papers Anwar Ghammam Oakland University, Rania Khalsi University of Michigan - Flint, Marouane Kessentini University of Michigan - Flint, Foyzul Hassan University of Michigan at Dearborn

16:00 - 17:30	Testing and QA 2Journal-first Papers / Research Track at 205 Chair(s): Andreas Zeller CISPA Helmholtz Center for Information Security

16:00 15m Talk		EpiTESTER: Testing Autonomous Vehicles with Epigenetic Algorithm and Attention Mechanism Journal-first Papers Chengjie Lu Simula Research Laboratory and University of Oslo, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Tao Yue Beihang University
16:15 15m Talk		GenMorph: Automatically Generating Metamorphic Relations via Genetic Programming Journal-first Papers Jon Ayerdi Mondragon University, Valerio Terragni University of Auckland, Gunel Jahangirova King's College London, Aitor Arrieta Mondragon University, Paolo Tonella USI Lugano
16:30 15m Talk		Guess the State: Exploiting Determinism to Improve GUI Exploration Efficiency Journal-first Papers Diego Clerissi University of Milano-Bicocca, Giovanni Denaro University of Milano - Bicocca, Marco Mobilio University of Milano Bicocca, Leonardo Mariani University of Milano-Bicocca
16:45 15m Talk		Runtime Verification and Field-based Testing for ROS-based Robotic Systems Journal-first Papers Ricardo Caldas Gran Sasso Science Institute (GSSI), Juan Antonio Piñera García Gran Sasso Science Institute, Matei Schiopu Chalmers \| Gothenburg University, Patrizio Pelliccione Gran Sasso Science Institute, L'Aquila, Italy, Genaína Nunes Rodrigues University of Brasília, Thorsten Berger Ruhr University Bochum Link to publication DOI
17:00 15m Talk		Towards Effectively Testing Machine Translation Systems from White-Box Perspectives Journal-first Papers Hanying Shao University of Waterloo, Zishuo Ding The Hong Kong University of Science and Technology (Guangzhou), Weiyi Shang University of Waterloo, Jinqiu Yang Concordia University, Nikolaos Tsantalis Concordia University
17:15 15m Talk		Using Knowledge Units of Programming Languages to Recommend Reviewers for Pull Requests: An Empirical Study Journal-first Papers Md Ahasanuzzaman Queen's University, Gustavo A. Oliva Queen's University, Ahmed E. Hassan Queen’s University, Md Ahasanuzzaman Queen's University

16:00 - 17:45	Human and Social 1SE in Society (SEIS) / SE In Practice (SEIP) / Research Track at 206 plus 208 Chair(s): Yvonne Dittrich IT University of Copenhagen, Denmark

16:00 15m Talk		Systematizing Inclusive Design in MOSIP: An Experience Report SE In Practice (SEIP) Soumiki Chattopadhyay Oregon State University, Amreeta Chatterjee Oregon State University, Puja Agarwal Oregon State University, Bianca Trinkenreich Colorado State University, Swarathmika Kumar MOSIP-IIIT Bangalore, Rohit Ranjan Rai MOSIP-IIIT Bangalore, Resham Chugani MOSIP-IIIT Bangalore, Pragya Kumari MOSIP-IIIT Bangalore, Margaret Burnett Oregon State University, Anita Sarma Oregon State University Pre-print
16:15 15m Talk		A Collaborative Framework for Cross-Domain Scientific Experiments for Society 5.0Research Methods SE in Society (SEIS) Muhammad Mainul Hossain University of Saskatchewan, Banani Roy University of Saskatchewan, Chanchal K. Roy University of Saskatchewan, Kevin Schneider University of Saskatchewan
16:30 15m Talk		A First Look at AI Trends in Value-Aligned Software Engineering Publications: Human-LLM Insights SE in Society (SEIS) Ahmad Azarnik Universiti Teknologi Malaysia, Davoud Mougouei , Mahdi Fahmideh University of Southern Queensland, Elahe Mougouei Islamic Azad University Najafabad, Hoa Khanh Dam University of Wollongong, Arif Ali Khan University of Oulu, Saima Rafi Edinburgh Napier University, Javed Ali Khan University of Hertforshire Hertfordshire, UK, Aakash Ahmad School of Computing and Communications, Lancaster University Leipzig, Leipzig, Germany Link to publication
16:45 15m Talk		From Expectation to Habit: Why Do Software Practitioners Adopt Fairness Toolkits? SE in Society (SEIS) Gianmario Voria University of Salerno, Stefano Lambiase Aalborg University in Copenhagen, Maria Concetta Schiavone University of Salerno, Gemma Catolino University of Salerno, Fabio Palomba University of Salerno Pre-print
17:00 15m Talk		Not real or too soft? On the challenges of publishing interdisciplinary software engineering research SE in Society (SEIS) Sonja Hyrynsalmi LUT University, Grischa Liebel Reykjavik University, Ronnie de Souza Santos University of Calgary, Sebastian Baltes University of Bayreuth Pre-print
17:15 15m Talk		What is unethical about software? User perceptions in the Netherlands SE in Society (SEIS) Yagil Elias Vrije Universiteit Amsterdam, Tom P Humbert Vrije Universiteit Amsterdam, Lauren Olson Vrije Universiteit Amsterdam, Emitzá Guzmán Vrije Universiteit Amsterdam Pre-print

16:00 - 17:30	Human and Social Process 2Journal-first Papers / Research Track at 207 Chair(s): Armstrong Foundjem École Polytechnique de Montréal

16:00 15m Talk		An Empirical Study on Developers' Shared Conversations with ChatGPT in GitHub Pull Requests and Issues Journal-first Papers Huizi Hao Queen's University, Canada, Kazi Amit Hasan Queen's University, Canada, Hong Qin Queen's University, Marcos Macedo Queen's University, Yuan Tian Queen's University, Kingston, Ontario, Ding Steven, H., H. Queen’s University at Kingston, Ahmed E. Hassan Queen’s University
16:15 15m Talk		Who’s Pushing the Code: An Exploration of GitHub Impersonation Research Track Yueke Zhang Vanderbilt University, Anda Liang Vanderbilt University, Xiaohan Wang Vanderbilt University, Pamela J. Wisniewski Vanderbilt University, Fengwei Zhang Southern University of Science and Technology, Kevin Leach Vanderbilt University, Yu Huang Vanderbilt University
16:30 15m Talk		Understanding Real-time Collaborative Programming: a Study of Visual Studio Live Share Journal-first Papers Xin Tan Beihang University, Xinyue Lv Beihang University, Jing Jiang Beihang University, Li Zhang Beihang University
16:45 15m Talk		Characterizing the Prevalence, Distribution, and Duration of Stale Reviewer Recommendations Journal-first Papers Farshad Kazemi University of Waterloo, Maxime Lamothe Polytechnique Montreal, Shane McIntosh University of Waterloo
17:00 15m Talk		Diversity's Double-Edged Sword: Analyzing Race's Effect on Remote Pair Programming Interactions Journal-first Papers Shandler Mason North Carolina State University, Sandeep Kuttal North Carolina State University
17:15 7m Talk		Investigating the Impact of Interpersonal Challenges on Feeling Welcome in OSS Research Track Bianca Trinkenreich Colorado State University, Zixuan Feng Oregon State University, USA, Rudrajit Choudhuri Oregon State University, Marco Gerosa Northern Arizona University, Anita Sarma Oregon State University, Igor Steinmacher NAU RESHAPE LAB Pre-print

16:00 - 17:30	SE for AI with SecurityResearch Track at 210 Chair(s): Lina Marsso École Polytechnique de Montréal

16:00 15m Talk		Understanding the Effectiveness of Coverage Criteria for Large Language Models: A Special Angle from Jailbreak AttacksSecuritySE for AI Research Track shide zhou Huazhong University of Science and Technology, Li Tianlin NTU, Kailong Wang Huazhong University of Science and Technology, Yihao Huang NTU, Ling Shi Nanyang Technological University, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology
16:15 15m Talk		Diversity Drives Fairness: Ensemble of Higher Order Mutants for Intersectional Fairness of Machine Learning SoftwareSecuritySE for AI Research Track Zhenpeng Chen Nanyang Technological University, Xinyue Li Peking University, Jie M. Zhang King's College London, Federica Sarro University College London, Yang Liu Nanyang Technological University Pre-print
16:30 15m Talk		HIFI: Explaining and Mitigating Algorithmic Bias through the Lens of Game-Theoretic InteractionsSecuritySE for AI Research Track Lingfeng Zhang East China Normal University, Zhaohui Wang Software Engineering Institute, East China Normal University, Yueling Zhang East China Normal University, Min Zhang East China Normal University, Jiangtao Wang Software Engineering Institute, East China Normal University
16:45 15m Talk		Towards More Trustworthy Deep Code Models by Enabling Out-of-Distribution DetectionSecuritySE for AI Research Track Yanfu Yan William & Mary, Viet Duong William & Mary, Huajie Shao College of William & Mary, Denys Poshyvanyk William & Mary
17:00 15m Talk		FairSense: Long-Term Fairness Analysis of ML-Enabled SystemsSecuritySE for AI Research Track Yining She Carnegie Mellon University, Sumon Biswas Carnegie Mellon University, Christian Kästner Carnegie Mellon University, Eunsuk Kang Carnegie Mellon University

16:00 - 17:30	RequirementsResearch Track / Demonstrations / New Ideas and Emerging Results (NIER) at 211 Chair(s): Jane Cleland-Huang University of Notre Dame

16:00 15m Talk		A Little Goes a Long Way: Tuning Configuration Selection for Continuous Kernel Fuzzing Research Track Sanan Hasanov University of Central Florida, Stefan Nagy University of Utah, Paul Gazzillo University of Central Florida
16:15 15m Talk		Exploring the Robustness of the Effect of EVO on Intention Valuation through ReplicationAward Winner Research Track Yesugen Baatartogtokh University of Massachusetts Amherst, Kaitlyn Cook Smith College, Alicia M. Grubb Smith College
16:30 15m Talk		Unavoidable Boundary Conditions: A Control Perspective on Goal Conflicts Research Track Sebastian Uchitel Universidad de Buenos Aires / Imperial College, Francisco Cirelli Universidad de Buenos Aires, Dalal Alrajeh Imperial College London
16:45 15m Talk		User Personas Improve Social Sustainability by Encouraging Software Developers to Deprioritize Antisocial Features Research Track Bimpe Ayoola Dalhousie University, Miikka Kuutila Dalhousie University, Rina R. Wehbe Dalhousie University, Paul Ralph Dalhousie University Pre-print
17:00 15m Talk		VReqST: A Requirement Specification Tool for Virtual Reality Software Products Demonstrations Amogha A Halhalli Software Engineering Research Center. IIIT Hyderabad, Raghu Reddy IIIT Hyderabad, Karre Sai Anirudh Phenom Inc.
17:15 15m Talk		What is a Feature, Really? Toward a Unified Understanding Across SE Disciplines New Ideas and Emerging Results (NIER) Nitish Patkar FHNW, Aimen Fahmi Tata Consultancy Services, Timo Kehrer University of Bern, Norbert Seyff University of Applied Sciences and Arts Northwestern Switzerland FHNW

16:00 - 17:30	AI for Analysis 2Research Track / Journal-first Papers at 212 Chair(s): Julia Rubin The University of British Columbia

16:00 15m Talk		Neurosymbolic Modular Refinement Type Inference Research Track Georgios Sakkas UC San Diego, Pratyush Sahu UC San Diego, Kyeling Ong University of California, San Diego, Ranjit Jhala University of California at San Diego
16:15 15m Talk		An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We? Research Track Hyunjae Suh University of California, Irvine, Mahan Tafreshipour University of California at Irvine, Jiawei Li University of California Irvine, Adithya Bhattiprolu University of California, Irvine, Iftekhar Ahmed University of California at Irvine
16:30 15m Talk		Planning a Large Language Model for Static Detection of Runtime Errors in Code Snippets Research Track Smit Soneshbhai Patel University of Texas at Dallas, Aashish Yadavally University of Texas at Dallas, Hridya Dhulipala University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas
16:45 15m Talk		LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion Research Track Chong Wang Nanyang Technological University, Kaifeng Huang Tongji University, Jian Zhang Nanyang Technological University, Yebo Feng Nanyang Technological University, Lyuye Zhang Nanyang Technological University, Yang Liu Nanyang Technological University, Xin Peng Fudan University
17:00 15m Talk		Knowledge-Enhanced Program Repair for Data Science Code Research Track Shuyin Ouyang King's College London, Jie M. Zhang King's College London, Zeyu Sun Institute of Software, Chinese Academy of Sciences, Albert Merono Penuela King's College London
17:15 7m Talk		SparseCoder: Advancing Source Code Analysis with Sparse Attention and Learned Token Pruning Journal-first Papers Xueqi Yang North Carolina State University, Mariusz Jakubowski Microsoft, Li Kang Microsoft, Haojie Yu Microsoft, Tim Menzies North Carolina State University Link to publication DOI

16:00 - 17:30	AI for Program Comprehension 1Research Track at 213 Chair(s): Yintong Huo Singapore Management University, Singapore

16:00 15m Talk		ADAMAS: Adaptive Domain-Aware Performance Anomaly Detection in Cloud Service Systems Research Track Wenwei Gu The Chinese University of Hong Kong, Jiazhen Gu Chinese University of Hong Kong, Jinyang Liu Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Jianping Zhang The Chinese University of Hong Kong, Jinxi Kuang The Chinese University of Hong Kong, Cong Feng Huawei Cloud Computing Technology, Yongqiang Yang Huawei Cloud Computing Technology, Michael Lyu The Chinese University of Hong Kong
16:15 15m Talk		LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models Research Track Zeyang Ma Concordia University, Dong Jae Kim DePaul University, Tse-Hsun (Peter) Chen Concordia University
16:30 15m Talk		Model Editing for LLMs4Code: How Far are We? Research Track Xiaopeng Li National University of Defense Technology, Shangwen Wang National University of Defense Technology, Shasha Li National University of Defense Technology, Jun Ma National University of Defense Technology, Jie Yu National University of Defense Technology, Xiaodong Liu National University of Defense Technology, Jing Wang National University of Defense Technology, Bin Ji National University of Defense Technology, Weimin Zhang National University of Defense Technology Pre-print
16:45 15m Talk		Software Model Evolution with Large Language Models: Experiments on Simulated, Public, and Industrial Datasets Research Track Christof Tinnes Saarland University, Alisa Carla Welter Saarland University, Sven Apel Saarland University Pre-print
17:00 15m Talk		SpecRover: Code Intent Extraction via LLMs Research Track Haifeng Ruan National University of Singapore, Yuntong Zhang National University of Singapore, Abhik Roychoudhury National University of Singapore
17:15 15m Talk		Unleashing the True Potential of Semantic-based Log Parsing with Pre-trained Language Models Research Track Van-Hoang Le The University of Newcastle, Yi Xiao , Hongyu Zhang Chongqing University

16:00 - 17:30	AI for Testing and QA 2Research Track / SE In Practice (SEIP) at 214 Chair(s): Michael Pradel University of Stuttgart

16:00 15m Talk		Faster Configuration Performance Bug Testing with Neural Dual-level Prioritization Research Track Youpeng Ma University of Electronic Science and Technology of China, Tao Chen University of Birmingham, Ke Li University of Exeter Pre-print
16:15 15m Talk		Metamorphic-Based Many-Objective Distillation of LLMs for Code-related Tasks Research Track Annibale Panichella Delft University of Technology
16:30 15m Talk		NIODebugger: A Novel Approach to Repair Non-Idempotent-Outcome Tests with LLM-Based Agent Research Track Kaiyao Ke University of Illinois at Urbana-Champaign
16:45 15m Talk		Test Intention Guided LLM-based Unit Test Generation Research Track Zifan Nan Huawei, Zhaoqiang Guo Software Engineering Application Technology Lab, Huawei, China, Kui Liu Huawei, Xin Xia Huawei
17:00 15m Talk		What You See Is What You Get: Attention-based Self-guided Automatic Unit Test Generation Research Track Xin Yin Zhejiang University, Chao Ni Zhejiang University, xiaodanxu College of Computer Science and Technology, Zhejiang university, Xiaohu Yang Zhejiang University Pre-print
17:15 15m Talk		Improving Code Performance Using LLMs in Zero-Shot: RAPGen SE In Practice (SEIP) Spandan Garg Microsoft Corporation, Roshanak Zilouchian Moghaddam Microsoft, Neel Sundaresan Microsoft

16:00 - 17:30	Analysis 1Research Track / SE In Practice (SEIP) / Journal-first Papers at 215 Chair(s): Antonio Filieri AWS and Imperial College London

16:00 15m Talk		SUPERSONIC: Learning to Generate Source Code Optimizations in C/C++ Journal-first Papers Zimin Chen KTH Royal Institute of Technology, Sen Fang North Carolina State University, Martin Monperrus KTH Royal Institute of Technology
16:15 15m Talk		An Extensive Empirical Study of Nondeterministic Behavior in Static Analysis Tools Research Track Miao Miao The University of Texas at Dallas, Austin Mordahl University of Illinois Chicago, Dakota Soles The University of Texas at Dallas, Alice Beideck The University of Texas at Dallas, Shiyi Wei University of Texas at Dallas
16:30 15m Talk		Interactive Cross-Language Pointer Analysis for Resolving Native Code in Java ProgramsAward Winner Research Track Chenxi Zhang Nanjing University, Yufei Liang Nanjing University, Tian Tan Nanjing University, Chang Xu Nanjing University, Shuangxiang Kan UNSW, Yulei Sui University of New South Wales, Yue Li Nanjing University
16:45 15m Talk		Execution Trace Reconstruction Using Diffusion-Based Generative Models Research Track Madeline Janecek Brock University, Naser Ezzati Jivan , Wahab Hamou-Lhadj Concordia University, Montreal, Canada
17:00 15m Talk		Static Analysis of Remote Procedure Call in Java Programs Research Track Baoquan Cui Institute of Software at Chinese Academy of Sciences, China, RongQu State Key Laboratory of Computer Science, Institute of Software Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China, Zhen Tang Key Laboratory of System Software (Chinese Academy of Sciences), State Key Laboratory of Computer Science, Institute of Software Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China, Jian Zhang Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences
17:15 15m Talk		ArkAnalyzer: The Static Analysis Framework for OpenHarmony SE In Practice (SEIP) chenhaonan Beihang University, Daihang Chen Beihang University, Yizhuo Yang Beihang University, Lingyun Xu Huawei, Liang Gao Huawei, Mingyi Zhou Monash University, Chunming Hu Beihang University, Li Li Beihang University

16:00 - 17:30	AI for SE 2Research Track / Journal-first Papers at Canada Hall 1 and 2 Chair(s): Tingting Yu University of Connecticut

16:00 15m Talk		Large Language Models for Safe Minimization Research Track Aashish Yadavally University of Texas at Dallas, Xiaokai Rong The University of Texas at Dallas, Phat Nguyen The University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas
16:15 15m Talk		LUNA: A Model-Based Universal Analysis Framework for Large Language Models Journal-first Papers Da Song University of Alberta, Xuan Xie University of Alberta, Norman Song , Derui Zhu Technical University of Munich, Yuheng Huang University of Alberta, Canada, Felix Juefei-Xu New York University, Lei Ma The University of Tokyo & University of Alberta, Yuheng Huang University of Alberta, Canada
16:30 15m Talk		Intention is All You Need: Refining Your Code from Your Intention Research Track Qi Guo Tianjin University, Xiaofei Xie Singapore Management University, Shangqing Liu Nanyang Technological University, Ming Hu Nanyang Technological University, Xiaohong Li Tianjin University, Lei Bu Nanjing University
16:45 15m Talk		RLCoder: Reinforcement Learning for Repository-Level Code Completion Research Track Yanlin Wang Sun Yat-sen University, Yanli Wang Sun Yat-sen University, Daya Guo , Jiachi Chen Sun Yat-sen University, Ruikai Zhang Huawei Cloud Computing Technologies, Yuchi Ma Huawei Cloud Computing Technologies, Zibin Zheng Sun Yat-sen University
17:00 15m Talk		InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation Research Track Marcos Macedo Queen's University, Yuan Tian Queen's University, Kingston, Ontario, Pengyu Nie University of Waterloo, Filipe Cogo Centre for Software Excellence, Huawei Canada, Bram Adams Queen's University
17:15 15m Talk		Toward a Theory of Causation for Interpreting Neural Code Models Journal-first Papers David Nader Palacio William & Mary, Alejandro Velasco William & Mary, Nathan Cooper William & Mary, Alvaro Rodriguez Universidad Nacional de Colombia, Kevin Moran University of Central Florida, Denys Poshyvanyk William & Mary Link to publication DOI Pre-print

Thu 1 May
Displayed time zone: Eastern Time (US & Canada) change

10:30 - 11:00	Thu Morning Break Posters 10:30-11Research Track / New Ideas and Emerging Results (NIER) / Demonstrations / Journal-first Papers / Posters at Canada Hall 3 Poster Area

10:30 30m Poster		Pattern-based Generation and Adaptation of Quantum WorkflowsQuantum Research Track Martin Beisel Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Johanna Barzen University of Stuttgart, Frank Leymann University of Stuttgart, Lavinia Stiliadou Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Daniel Vietz University of Stuttgart, Benjamin Weder Institute of Architecture of Application Systems (IAAS), University of Stuttgart
10:30 30m Talk		A Unit Proofing Framework for Code-level Verification: A Research AgendaFormal Methods New Ideas and Emerging Results (NIER) Paschal Amusuo Purdue University, Parth Vinod Patil Purdue University, Owen Cochell Michigan State University, Taylor Le Lievre Purdue University, James C. Davis Purdue University Pre-print
10:30 30m Talk		SolSearch: An LLM-Driven Framework for Efficient SAT-Solving Code GenerationFormal Methods New Ideas and Emerging Results (NIER) Junjie Sheng East China Normal University, Yanqiu Lin East China Normal University, Jiehao Wu East China Normal University, Yanhong Huang East China Normal University, Jianqi Shi East China Normal University, Min Zhang East China Normal University, Xiangfeng Wang East China Normal University
10:30 30m Talk		Listening to the Firehose: Sonifying Z3’s BehaviorFormal Methods New Ideas and Emerging Results (NIER) Finn Hackett University of British Columbia, Ivan Beschastnikh University of British Columbia
10:30 30m Poster		HyperCRX 2.0: A Comprehensive and Automated Tool for Empowering GitHub Insights Demonstrations Yantong Wang East China Normal University, Shengyu Zhao Tongji University, will wang , Fenglin Bi East China Normal University
10:30 30m Talk		Using ML filters to help automated vulnerability repairs: when it helps and when it doesn’tSecurity New Ideas and Emerging Results (NIER) Maria Camporese University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam Pre-print
10:30 30m Talk		Automated Testing Linguistic Capabilities of NLP Models Journal-first Papers Jaeseong Lee The University of Texas at Dallas, Simin Chen University of Texas at Dallas, Austin Mordahl University of Illinois Chicago, Cong Liu University of California, Riverside, Wei Yang UT Dallas, Shiyi Wei University of Texas at Dallas
10:30 30m Poster		Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models Research Track Kunpeng Zhang The Hong Kong University of Science and Technology, Shuai Wang Hong Kong University of Science and Technology, Jitao Han Central University of Finance and Economics, Xiaogang Zhu The University of Adelaide, Xian Li Swinburne University of Technology, Shaohua Wang Central University of Finance and Economics, Sheng Wen Swinburne University of Technology

11:00 - 12:30	Design for AINew Ideas and Emerging Results (NIER) / SE In Practice (SEIP) / Research Track at 203 Chair(s): Chunyang Chen TU Munich

11:00 15m Talk		A Large-Scale Study of Model Integration in ML-Enabled Software SystemsSE for AI Research Track Yorick Sens Ruhr University Bochum, Henriette Knopp Ruhr University Bochum, Sven Peldszus Ruhr University Bochum, Thorsten Berger Ruhr University Bochum Pre-print
11:15 15m Talk		Are LLMs Correctly Integrated into Software Systems?SE for AI Research Track Yuchen Shao East China Normal University, Yuheng Huang the University of Tokyo, Jiawei Shen East China Normal University, Lei Ma The University of Tokyo & University of Alberta, Ting Su East China Normal University, Chengcheng Wan East China Normal University
11:30 15m Talk		Patch Synthesis for Property Repair of Deep Neural NetworksSE for AI Research Track Zhiming Chi Institute of Software, Chinese Academy of Sciences, Jianan Ma Hangzhou Dianzi University, China; Zhejiang University, Hangzhou, China, Pengfei Yang Institute of Software at Chinese Academy of Sciences, China, Cheng-Chao Huang Nanjing Institute of Software Technology, ISCAS, Renjue Li Institute of Software at Chinese Academy of Sciences, China, Jingyi Wang Zhejiang University, Xiaowei Huang University of Liverpool, Lijun Zhang Institute of Software, Chinese Academy of Sciences
11:45 15m Talk		Optimizing Experiment Configurations for LLM Applications Through Exploratory AnalysisSE for AI New Ideas and Emerging Results (NIER) Nimrod Busany Accenture Labs, Israel, Hananel Hadad Accenture Labs, Israel, Zofia Maszlanka Avanade, Poland, Rohit Shelke University of Ottawa, Canada, Gregory Price University of Ottawa, Canada, Okhaide Akhigbe University of Ottawa, Daniel Amyot University of Ottawa
12:00 15m Talk		AI-Assisted SQL Authoring at Industry ScaleSE for AI SE In Practice (SEIP) Chandra Sekhar Maddila Meta Platforms, Inc., Negar Ghorbani Meta Platforms Inc., Kosay Jabre Meta Platforms, Inc., Vijayaraghavan Murali Meta Platforms Inc., Edwin Kim Meta Platforms, Inc., Parth Thakkar Meta Platforms, Inc., Nikolay Pavlovich Laptev Meta Platforms, Inc., Olivia Harman Meta Platforms, Inc., Diana Hsu Meta Platforms, Inc., Rui Abreu Meta, Peter C Rigby Meta / Concordia University
12:15 15m Talk		Automating ML Model Development at ScaleSE for AI SE In Practice (SEIP) Kaiyuan Wang Google, Yang Li Google Inc, Junyang Shen Google Inc, Kaikai Sheng Google Inc, Yiwei You Google Inc, Jiaqi Zhang Google Inc, Srikar Ayyalasomayajula Google Inc, Julian Grady Google Inc, Martin Wicke Google Inc

11:00 - 12:30	Analysis 2SE In Practice (SEIP) / Journal-first Papers / Demonstrations / Research Track at 205 Chair(s): Mahmoud Alfadel University of Calgary

11:00 15m Talk		SIT: An accurate, compliant SBOM generator with incremental construction Demonstrations Changguo Jia Peking University, NIANYU LI ZGC Lab, China, Minghui Zhou Peking University, Kai Yang
11:15 15m Talk		Towards Better Static Analysis Bug Reports in the Clang Static Analyzer SE In Practice (SEIP) Kristóf Umann Eötvös Loránd University, Faculty of Informatics, Dept. of Programming Languages and Compilers, Zoltán Porkoláb Ericsson
11:30 15m Talk		Automatic Identification of Game Stuttering via Gameplay Videos Analysis Journal-first Papers Emanuela Guglielmi University of Molise, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Rocco Oliveto University of Molise, Simone Scalabrino University of Molise
11:45 15m Talk		LLM Driven Smart Assistant for Data Mapping SE In Practice (SEIP) Arihant Bedagkar Tata Consultancy Services, Sayandeep Mitra Tata Consultancy Services, Raveendra Kumar Medicherla TCS Research, Tata Consultancy Services, Ravindra Naik TCS Research, TRDDC, India, Samiran Pal Tata Consultancy Services
12:00 15m Talk		On the Diagnosis of Flaky Job Failures: Understanding and Prioritizing Failure Categories SE In Practice (SEIP) Henri Aïdasso École de technologie supérieure (ÉTS), Francis Bordeleau École de Technologie Supérieure (ETS), Ali Tizghadam TELUS Pre-print
12:15 7m Talk		AddressWatcher: Sanitizer-Based Localization of Memory Leak Fixes Journal-first Papers Aniruddhan Murali University of Waterloo, Mahmoud Alfadel University of Calgary, Mei Nagappan University of Waterloo, Meng Xu University of Waterloo, Chengnian Sun University of Waterloo

11:00 - 12:30	Human and Social 2Research Track / Journal-first Papers at 206 plus 208 Chair(s): Alexander Serebrenik Eindhoven University of Technology

11:00 15m Talk		Code Today, Deadline Tomorrow: Procrastination Among Software Developers Research Track Zeinabsadat Saghi University of Southern California, Thomas Zimmermann University of California, Irvine, Souti Chattopadhyay University of Southern California
11:15 15m Talk		"Get Me In The Groove": A Mixed Methods Study on Supporting ADHD Professional Programmers Research Track Kaia Newman Carnegie Mellon University, Sarah Snay University of Michigan, Madeline Endres University of Massachusetts Amherst, Manasvi Parikh University of Michigan, Andrew Begel Carnegie Mellon University Pre-print
11:30 15m Talk		Hints Help Finding and Fixing Bugs Differently in Python and Text-based Program Representations Research Track Ruchit Rawal Max Planck Institute for Software Systems, Victor-Alexandru Padurean Max Planck Institute for Software Systems, Sven Apel Saarland University, Adish Singla Max Planck Institute for Software Systems, Mariya Toneva Max Planck Institute for Software Systems Pre-print
11:45 15m Talk		How Scientists Use Jupyter Notebooks: Goals, Quality Attributes, and Opportunities Research Track Ruanqianqian (Lisa) Huang University of California, San Diego, Savitha Ravi UC San Diego, Michael He UCSD, Boyu Tian University of California, San Diego, Sorin Lerner University of California at San Diego, Michael Coblenz University of California, San Diego Pre-print
12:00 15m Talk		Investigating the Online Recruitment and Selection Journey of Novice Software Engineers: Anti-patterns and Recommendations Journal-first Papers Miguel Setúbal Federal University of Ceará, Tayana Conte Universidade Federal do Amazonas, Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Allysson Allex Araújo Federal University of Cariri Link to publication Pre-print
12:15 15m Talk		Reputation Gaming in Crowd Technical Knowledge Sharing Journal-first Papers Iren Mazloomzadeh École Polytechnique de Montréal, Gias Uddin York University, Canada, Foutse Khomh Polytechnique Montréal, Ashkan Sami Edinburgh Napier University

11:00 - 12:30	Security and Analysis 1Research Track / SE In Practice (SEIP) at 210 Chair(s): Akond Rahman Auburn University

11:00 15m Talk		Accounting for Missing Events in Statistical Information Leakage AnalysisSecurity Research Track Seongmin Lee Max Planck Institute for Security and Privacy (MPI-SP), Shreyas Minocha Georgia Tech, Marcel Böhme MPI for Security and Privacy
11:15 15m Talk		AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software ArtifactsSecurity Research Track Setu Kumar Basak North Carolina State University, K. Virgil English North Carolina State University, Ken Ogura North Carolina State University, Vitesh Kambara North Carolina State University, Bradley Reaves North Carolina State University, Laurie Williams North Carolina State University
11:30 15m Talk		Enhancing The Open Network: Definition and Automated Detection of Smart Contract DefectsBlockchainSecurityAward Winner Research Track Hao Song , Teng Li University of Electronic Science and Technology of China, Jiachi Chen Sun Yat-sen University, Ting Chen University of Electronic Science and Technology of China, Beibei Li Sichuan University, Zhangyan Lin University of Electronic Science and Technology of China, Yi Lu BitsLab, Pan Li MoveBit, Xihan Zhou TonBit
11:45 15m Talk		Detecting Python Malware in the Software Supply Chain with Program AnalysisSecurity SE In Practice (SEIP) Ridwan Salihin Shariffdeen National University of Singapore, Behnaz Hassanshahi Oracle Labs, Australia, Martin Mirchev National University of Singapore, Ali El Husseini National University of Singapore, Abhik Roychoudhury National University of Singapore
12:00 15m Talk		$ZTD_{JAVA}$: Mitigating Software Supply Chain Vulnerabilities via Zero-Trust DependenciesSecurity Research Track Paschal Amusuo Purdue University, Kyle A. Robinson Purdue University, Tanmay Singla Purdue University, Huiyun Peng Mount Holyoke College, Aravind Machiry Purdue University, Santiago Torres-Arias Purdue University, Laurent Simon Google, James C. Davis Purdue University Pre-print
12:15 15m Talk		FairChecker: Detecting Fund-stealing Bugs in DeFi Protocols via Fairness ValidationBlockchainSecurity Research Track Yi Sun Purdue University, USA, Zhuo Zhang Purdue University, Xiangyu Zhang Purdue University

11:00 - 12:30	AI for Design and ArchitectureDemonstrations / SE In Practice (SEIP) / Research Track at 211 Chair(s): Sarah Nadi New York University Abu Dhabi

11:00 15m Talk		An LLM-Based Agent-Oriented Approach for Automated Code Design Issue Localization Research Track Fraol Batole Tulane University, David OBrien Iowa State University, Tien N. Nguyen University of Texas at Dallas, Robert Dyer University of Nebraska-Lincoln, Hridesh Rajan Tulane University
11:15 15m Talk		Distilled Lifelong Self-Adaptation for Configurable Systems Research Track Yulong Ye University of Birmingham, Tao Chen University of Birmingham, Miqing Li University of Birmingham Pre-print
11:30 15m Talk		The Software Librarian: Python Package Insights for Copilot Demonstrations Jasmine Latendresse Concordia University, Nawres Day ISSAT Sousse, SayedHassan Khatoonabadi Concordia University, Montreal, Emad Shihab Concordia University, Montreal
11:45 15m Talk		aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing SE In Practice (SEIP) Siyuan Jiang , Jia Li Peking University, He Zong aiXcoder, Huanyu Liu Peking University, Hao Zhu Peking University, Shukai Hu aiXcoder, Erlu Li aiXcoder, Jiazheng Ding aiXcoder, Ge Li Peking University Pre-print
12:00 15m Talk		Leveraging MLOps: Developing a Sequential Classification System for RFQ Documents in Electrical Engineering SE In Practice (SEIP) Claudio Martens Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Hammam Abdelwahab Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Katharina Beckh Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Birgit Kirsch Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Vishwani Gupta Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Dennis Wegener Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Steffen Hoh Schneider Electric
12:15 15m Talk		On Mitigating Code LLM Hallucinations with API Documentation SE In Practice (SEIP) Nihal Jain Amazon Web Services, Robert Kwiatkowski , Baishakhi Ray Columbia University, Murali Krishna Ramanathan AWS AI Labs, Varun Kumar AWS AI Labs

11:00 - 12:30	AI for Analysis 3SE In Practice (SEIP) / Research Track at 212 Chair(s): Gias Uddin York University, Canada

11:00 15m Talk		COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge Research Track Yichen LI The Chinese University of Hong Kong, Yulun Wu The Chinese University of Hong Kong, Jinyang Liu Chinese University of Hong Kong, Zhihan Jiang The Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Guangba Yu The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong
11:15 15m Talk		Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding Research Track Yifeng Di Purdue University, Tianyi Zhang Purdue University
11:30 15m Talk		HumanEvo: An Evolution-aware Benchmark for More Realistic Evaluation of Repository-level Code Generation Research Track Dewu Zheng Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Ensheng Shi Xi’an Jiaotong University, Ruikai Zhang Huawei Cloud Computing Technologies, Yuchi Ma Huawei Cloud Computing Technologies, Hongyu Zhang Chongqing University, Zibin Zheng Sun Yat-sen University
11:45 15m Talk		SEMANTIC CODE FINDER: An Efficient Semantic Search Framework for Large-Scale Codebases SE In Practice (SEIP) daeha ryu Innovation Center, Samsung Electronics, Seokjun Ko Samsung Electronics Co., Eunbi Jang Innovation Center, Samsung Electronics, jinyoung park Innovation Center, Samsung Electronics, myunggwan kim Innovation Center, Samsung Electronics, changseo park Innovation Center, Samsung Electronics
12:00 15m Talk		Time to Retrain? Detecting Concept Drifts in Machine Learning Systems SE In Practice (SEIP) Tri Minh-Triet Pham Concordia University, Karthikeyan Premkumar Ericsson, Mohamed Naili Ericsson, Jinqiu Yang Concordia University
12:15 15m Talk		UML Sequence Diagram Generation: A Multi-Model, Multi-Domain Evaluation SE In Practice (SEIP) Chi Xiao Ericsson AB, Daniel Ståhl Ericsson AB, Jan Bosch Chalmers University of Technology

11:00 - 12:30	AI for RequirementsResearch Track / SE In Practice (SEIP) / Journal-first Papers / New Ideas and Emerging Results (NIER) at 213 Chair(s): Jennifer Horkoff Chalmers and the University of Gothenburg

11:00 15m Talk		From Bugs to Benefits: Improving User Stories by Leveraging Crowd Knowledge with CrUISE-AC Research Track Stefan Schwedt Heriot-Watt University, UK, Thomas Ströder FHDW Mettmann
11:15 15m Talk		LiSSA: Toward Generic Traceability Link Recovery through Retrieval-Augmented Generation Research Track Dominik Fuchß Karlsruhe Institute of Technology (KIT), Tobias Hey Karlsruhe Institute of Technology (KIT), Jan Keim Karlsruhe Institute of Technology (KIT), Haoyu Liu Karlsruhe Institute of Technology (KIT), Niklas Ewald Karlsruhe Institute of Technology (KIT), Tobias Thirolf Karlsruhe Institute of Technology (KIT), Anne Koziolek Karlsruhe Institute of Technology Pre-print Media Attached
11:30 15m Talk		Replication in Requirements Engineering: the NLP for RE Case Journal-first Papers Sallam Abualhaija University of Luxembourg, Fatma Başak Aydemir Utrecht University, Fabiano Dalpiaz Utrecht University, Davide Dell'Anna Utrecht University, Alessio Ferrari CNR-ISTI, Xavier Franch Universitat Politècnica de Catalunya, Davide Fucci Blekinge Institute of Technology
11:45 15m Talk		On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability New Ideas and Emerging Results (NIER) Andreas Vogelsang paluno, University of Duisburg-Essen, Alexander Korn University of Duisburg-Essen, Giovanna Broccia ISTI-CNR, FMT Lab, Alessio Ferrari Consiglio Nazionale delle Ricerche (CNR) and University College Dublin (UCD), Jannik Fischbach Netlight Consulting GmbH and fortiss GmbH, Chetan Arora Monash University
12:00 15m Talk		NICE: Non-Functional Requirements Identification, Classification, and Explanation Using Small Language ModelsAward Winner SE In Practice (SEIP) Gokul Rejithkumar TCS Research, Preethu Rose Anish TCS Research Pre-print

11:00 - 12:30	AI for Testing and QA 3Research Track at 214 Chair(s): Mike Papadakis University of Luxembourg

11:00 15m Talk		A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs Research Track Myeongsoo Kim Georgia Institute of Technology, Tyler Stennett Georgia Institute of Technology, Saurabh Sinha IBM Research, Alessandro Orso Georgia Institute of Technology
11:15 15m Talk		ClozeMaster: Fuzzing Rust Compiler by Harnessing LLMs for Infilling Masked Real Programs Research Track Hongyan Gao State Key Laboratory for Novel Software Technology, Nanjing University, Yibiao Yang Nanjing University, Maolin Sun Nanjing University, Jiangchang Wu State Key Laboratory for Novel Software Technology, Nanjing University, Yuming Zhou Nanjing University, Baowen Xu State Key Laboratory for Novel Software Technology, Nanjing University
11:30 15m Talk		LLM Based Input Space Partitioning Testing for Library APIs Research Track Jiageng Li Fudan University, Zhen Dong Fudan University, Chong Wang Nanyang Technological University, Haozhen You Fudan University, Cen Zhang Georgia Institute of Technology, Yang Liu Nanyang Technological University, Xin Peng Fudan University
11:45 15m Talk		Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests Research Track Amirhossein Deljouyi Delft University of Technology, Roham Koohestani Delft University of Technology, Maliheh Izadi Delft University of Technology, Andy Zaidman TU Delft DOI Pre-print
12:00 15m Talk		exLong: Generating Exceptional Behavior Tests with Large Language Models Research Track Jiyang Zhang University of Texas at Austin, Yu Liu Meta, Pengyu Nie University of Waterloo, Junyi Jessy Li University of Texas at Austin, USA, Milos Gligoric The University of Texas at Austin
12:15 15m Talk		TOGLL: Correct and Strong Test Oracle Generation with LLMs Research Track Soneya Binta Hossain University of Virginia, Matthew B Dwyer University of Virginia

11:00 - 12:30	SE for AI 2New Ideas and Emerging Results (NIER) / Research Track at 215 Chair(s): Grace Lewis Carnegie Mellon Software Engineering Institute

11:00 15m Talk		Answering User Questions about Machine Learning Models through Standardized Model CardsSE for AI Research Track Tajkia Rahman Toma University of Alberta, Balreet Grewal University of Alberta, Cor-Paul Bezemer University of Alberta Pre-print
11:15 15m Talk		Fairness Testing through Extreme Value TheorySE for AI Research Track Verya Monjezi University of Texas at El Paso, Ashutosh Trivedi University of Colorado Boulder, Vladik Kreinovich University of Texas at El Paso, Saeid Tizpaz-Niari University of Illinois Chicago
11:30 15m Talk		Fixing Large Language Models' Specification Misunderstanding for Better Code GenerationSE for AI Research Track Zhao Tian Tianjin University, Junjie Chen Tianjin University, Xiangyu Zhang Purdue University Pre-print
11:45 15m Talk		SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model AgentsSE for AI Research Track Feng Lin Concordia University, Dong Jae Kim DePaul University, Tse-Hsun (Peter) Chen Concordia University
12:00 15m Talk		The Product Beyond the Model -- An Empirical Study of Repositories of Open-Source ML ProductsSE for AI Research Track Nadia Nahar Carnegie Mellon University, Haoran Zhang Carnegie Mellon University, Grace Lewis Carnegie Mellon Software Engineering Institute, Shurui Zhou University of Toronto, Christian Kästner Carnegie Mellon University
12:15 15m Talk		Towards Trustworthy LLMs for Code: A Data-Centric Synergistic Auditing FrameworkSE for AI New Ideas and Emerging Results (NIER) Chong Wang Nanyang Technological University, Zhenpeng Chen Nanyang Technological University, Li Tianlin NTU, Yilun Zhang AIXpert, Yang Liu Nanyang Technological University

13:00 - 13:30	Thu Lunch Posters 13:00-13:30Research Track / SE in Society (SEIS) / Journal-first Papers / SE In Practice (SEIP) / Posters at Canada Hall 3 Poster Area

13:00 30m Talk		BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks Research Track Yisong Xiao Beihang University, Aishan Liu Beihang University; Institute of Dataspace, Xinwei Zhang Beihang University, Tianyuan Zhang Beihang University, Li Tianlin NTU, Siyuan Liang National University of Singapore, Xianglong Liu Beihang University; Institute of Dataspace; Zhongguancun Laboratory, Yang Liu Nanyang Technological University, Dacheng Tao Nanyang Technological University
13:00 30m Talk		Ethical Issues in Video Games: Insights from Reddit Discussions SE in Society (SEIS) Yeqian Li Vrije Universiteit Amsterdam, Kousar Aslam Vrije Universiteit Amsterdam
13:00 30m Talk		An Empirical Study on Developers' Shared Conversations with ChatGPT in GitHub Pull Requests and Issues Journal-first Papers Huizi Hao Queen's University, Canada, Kazi Amit Hasan Queen's University, Canada, Hong Qin Queen's University, Marcos Macedo Queen's University, Yuan Tian Queen's University, Kingston, Ontario, Ding Steven, H., H. Queen’s University at Kingston, Ahmed E. Hassan Queen’s University
13:00 30m Talk		QuanTest: Entanglement-Guided Testing of Quantum Neural Network SystemsQuantum Journal-first Papers Jinjing Shi Central South University, Zimeng Xiao Central South University, Heyuan Shi Central South University, Yu Jiang Tsinghua University, Xuelong LI China Telecom Link to publication
13:00 30m Poster		FlatD: Protecting Deep Neural Network Program from Reversing Attacks SE In Practice (SEIP) Jinquan Zhang The Pennsylvania State University, Zihao Wang Penn State University, Pei Wang Independent Researcher, Rui Zhong Palo Alto Networks, Dinghao Wu Pennsylvania State University
13:00 30m Talk		Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-PracticeSE for AI Journal-first Papers Bentley Oakes Polytechnique Montréal, Michalis Famelis Université de Montréal, Houari Sahraoui DIRO, Université de Montréal DOI Pre-print File Attached
13:00 30m Talk		On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools.Security Journal-first Papers Aurora Papotti Vrije Universiteit Amsterdam, Ranindya Paramitha University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
13:00 30m Talk		Automating Explanation Need Management in App Reviews: A Case Study from the Navigation App Industry SE In Practice (SEIP) Martin Obaidi Leibniz Universität Hannover, Nicolas Voß Graphmasters GmbH, Hannah Deters Leibniz University Hannover, Jakob Droste Leibniz Universität Hannover, Marc Herrmann Leibniz University Hannover, Jannik Fischbach Netlight Consulting GmbH and fortiss GmbH, Kurt Schneider Leibniz Universität Hannover, Software Engineering Group

13:30 - 14:00	Thu Lunch Posters 13:30-14:00Journal-first Papers / New Ideas and Emerging Results (NIER) / Research Track / Posters at Canada Hall 3 Poster Area

13:30 30m Poster		Non-Autoregressive Line-Level Code Completion Journal-first Papers Fang Liu Beihang University, Zhiyi Fu Peking University, Ge Li Peking University, Zhi Jin Peking University, Hui Liu Beijing Institute of Technology, Yiyang Hao Silicon Heart Tech Co., Li Zhang Beihang University
13:30 30m Talk		LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation Journal-first Papers Sarah Fakhoury Microsoft Research, Aaditya Naik University of Pennsylvania, Georgios Sakkas University of California at San Diego, Saikat Chakraborty Microsoft Research, Shuvendu K. Lahiri Microsoft Research Link to publication
13:30 30m Talk		SusDevOps: Promoting Sustainability to a First Principle in Software Delivery New Ideas and Emerging Results (NIER) Istvan David McMaster University / McMaster Centre for Software Certification (McSCert)
13:30 30m Poster		Predicting the First Response Latency of Maintainers and Contributors in Pull Requests Journal-first Papers SayedHassan Khatoonabadi Concordia University, Montreal, Ahmad Abdellatif University of Calgary, Diego Elias Costa Concordia University, Canada, Emad Shihab Concordia University, Montreal
13:30 30m Poster		RustAssistant: Using LLMs to Fix Compilation Errors in Rust Code Research Track Pantazis Deligiannis Microsoft Research, Akash Lal Microsoft Research, Nikita Mehrotra Microsoft Research, Rishi Poddar Microsoft Research, Aseem Rastogi Microsoft Research
13:30 30m Talk		Relevant information in TDD experiment reporting Journal-first Papers Fernando Uyaguari Instituto Superior Tecnológico Wissen, Silvia Teresita Acuña Castillo Universidad Autónoma de Madrid, John W. Castro Universidad de Atacama, Davide Fucci Blekinge Institute of Technology, Oscar Dieste Universidad Politécnica de Madrid, Sira Vegas Universidad Politecnica de Madrid

14:00 - 15:30	Testing and QA 3Research Track / Journal-first Papers at 205 Chair(s): Michael Pradel University of Stuttgart

14:00 15m Talk		Increasing the Effectiveness of Automatically Generated Tests by Improving Class ObservabilityAward Winner Research Track Geraldine Galindo-Gutierrez Centro de Investigación en Ciencias Exactas e Ingenierías, Universidad Católica Boliviana, Juan Pablo Sandoval Alcocer Pontificia Universidad Católica de Chile, Nicolas Jimenez-Fuentes Pontificia Universidad Católica de Chile, Alexandre Bergel University of Chile, Gordon Fraser University of Passau
14:15 15m Talk		Invivo Fuzzing by Amplifying Actual Executions Research Track Octavio Galland Canonical, Marcel Böhme MPI for Security and Privacy
14:30 15m Talk		Towards High-strength Combinatorial Interaction Testing for Highly Configurable Software Systems Research Track Chuan Luo Beihang University, Shuangyu Lyu Beihang University, Wei Wu Central South University; Xiangjiang Laboratory, Hongyu Zhang Chongqing University, Dianhui Chu Harbin Institute of Technology, Chunming Hu Beihang University
14:45 15m Talk		WDD: Weighted Delta Debugging Research Track Xintong Zhou University of Waterloo, Zhenyang Xu University of Waterloo, Mengxiao Zhang University of Waterloo, Yongqiang Tian , Chengnian Sun University of Waterloo
15:00 15m Talk		TopSeed: Learning Seed Selection Strategies for Symbolic Execution from Scratch Research Track Jaehyeok Lee Sungkyunkwan University, Sooyoung Cha Sungkyunkwan University
15:15 15m Talk		Hunting bugs: Towards an automated approach to identifying which change caused a bug through regression testing Journal-first Papers Michel Maes Bermejo Universidad Rey Juan Carlos, Alexander Serebrenik Eindhoven University of Technology, Micael Gallego Universidad Rey Juan Carlos, Francisco Gortázar Universidad Rey Juan Carlos, Gregorio Robles Universidad Rey Juan Carlos, Jesus M. Gonzalez-Barahona Universidad Rey Juan Carlos

14:00 - 15:30	AI for Testing and QA 4Journal-first Papers / Demonstrations / Research Track at 206 plus 208 Chair(s): Andreas Jedlitschka Fraunhofer IESE

14:00 15m Talk		The Seeds of the FUTURE Sprout from History: Fuzzing for Unveiling Vulnerabilities in Prospective Deep-Learning LibrariesSecurityAward Winner Research Track Zhiyuan Li , Jingzheng Wu Institute of Software, The Chinese Academy of Sciences, Xiang Ling Institute of Software, Chinese Academy of Sciences, Tianyue Luo Institute of Software, Chinese Academy of Sciences, ZHIQING RUI Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Yanjun Wu Institute of Software, Chinese Academy of Sciences
14:15 15m Talk		AutoRestTest: A Tool for Automated REST API Testing Using LLMs and MARL Demonstrations Tyler Stennett Georgia Institute of Technology, Myeongsoo Kim Georgia Institute of Technology, Saurabh Sinha IBM Research, Alessandro Orso Georgia Institute of Technology
14:30 15m Talk		FairBalance: How to Achieve Equalized Odds With Data Pre-processing Journal-first Papers Zhe Yu Rochester Institute of Technology, Joymallya Chakraborty Amazon.com, Tim Menzies North Carolina State University
14:45 15m Talk		RLocator: Reinforcement Learning for Bug Localization Journal-first Papers Partha Chakraborty University of Waterloo, Mahmoud Alfadel University of Calgary, Mei Nagappan University of Waterloo
15:00 15m Talk		Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP Journal-first Papers Lukas Schulte University of Passau, Benjamin Ledel Digital Learning GmbH, Steffen Herbold University of Passau
15:15 15m Talk		Test Generation Strategies for Building Failure Models and Explaining Spurious Failures Journal-first Papers Baharin Aliashrafi Jodat University of Ottawa, Abhishek Chandar University of Ottawa, Shiva Nejati University of Ottawa, Mehrdad Sabetzadeh University of Ottawa Pre-print

14:00 - 15:30	Human and Social using AI 1Research Track at 207 Chair(s): Romain Robbes CNRS, LaBRI, University of Bordeaux

14:00 15m Talk		Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers Research Track Yuling Shi Shanghai Jiao Tong University, Hongyu Zhang Chongqing University, Chengcheng Wan East China Normal University, Xiaodong Gu Shanghai Jiao Tong University
14:15 15m Talk		Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? Research Track Rosalia Tufano Università della Svizzera Italiana, Alberto Martin-Lopez Software Institute - USI, Lugano, Ahmad Tayeb , Ozren Dabic Software Institute, Università della Svizzera italiana (USI), Switzerland, Sonia Haiduc , Gabriele Bavota Software Institute @ Università della Svizzera Italiana
14:30 15m Talk		An Exploratory Study of ML Sketches and Visual Code Assistants Research Track Luis F. Gomes Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University, Jonathan Aldrich Carnegie Mellon University, Rui Abreu Faculty of Engineering of the University of Porto, Portugal
14:45 15m Talk		LiCoEval: Evaluating LLMs on License Compliance in Code Generation Research Track Weiwei Xu Peking University, Kai Gao Peking University, Hao He Carnegie Mellon University, Minghui Zhou Peking University Pre-print
15:00 15m Talk		Trust Dynamics in AI-Assisted Development: Definitions, Factors, and Implications Research Track Sadra Sabouri University of Southern California, Philipp Eibl University of Southern California, Xinyi Zhou University of Southern California, Morteza Ziyadi Amazon AGI, Nenad Medvidović University of Southern California, Lars Lindemann University of Southern California, Souti Chattopadhyay University of Southern California Pre-print
15:15 15m Talk		What Guides Our Choices? Modeling Developers' Trust and Behavioral Intentions Towards GenAI Research Track Rudrajit Choudhuri Oregon State University, Bianca Trinkenreich Colorado State University, Rahul Pandita GitHub, Inc., Eirini Kalliamvakou GitHub, Igor Steinmacher NAU RESHAPE LAB, Marco Gerosa Northern Arizona University, Christopher Sanchez Oregon State University, Anita Sarma Oregon State University Pre-print

14:00 - 15:30	AI for Security 1Research Track at 210 Chair(s): Tao Chen University of Birmingham

14:00 15m Talk		Large Language Models as Configuration ValidatorsSecurity Research Track Xinyu Lian University of Illinois at Urbana-Champaign, Yinfang Chen University of Illinois at Urbana-Champaign, Runxiang Cheng University of Illinois at Urbana-Champaign, Jie Huang University of Illinois at Urbana-Champaign, Parth Thakkar Meta Platforms, Inc., Minjia Zhang UIUC, Tianyin Xu University of Illinois at Urbana-Champaign
14:15 15m Talk		LLM Assistance for Memory SafetySecurity Research Track Nausheen Mohammed Microsoft Research, Akash Lal Microsoft Research, Aseem Rastogi Microsoft Research, Subhajit Roy IIT Kanpur, Rahul Sharma Microsoft Research
14:30 15m Talk		Vulnerability Detection with Code Language Models: How Far Are We?Security Research Track Yangruibo Ding Columbia University, Yanjun Fu University of Maryland, Omniyyah Ibrahim King Abdulaziz City for Science and Technology, Chawin Sitawarin University of California, Berkeley, Xinyun Chen , Basel Alomair King Abdulaziz City for Science and Technology, David Wagner UC Berkeley, Baishakhi Ray Columbia University, Yizheng Chen University of Maryland
14:45 15m Talk		Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with JustificationsBlockchainSecurity Research Track Wei Ma , Daoyuan Wu Hong Kong University of Science and Technology, Yuqiang Sun Nanyang Technological University, Tianwen Wang National University of Singapore, Shangqing Liu Nanyang Technological University, Jian Zhang Nanyang Technological University, Yue Xue , Yang Liu Nanyang Technological University
15:00 15m Talk		Towards Neural Synthesis for SMT-assisted Proof-Oriented ProgrammingSecurityFormal MethodsAward Winner Research Track Saikat Chakraborty Microsoft Research, Gabriel Ebner Microsoft Research, Siddharth Bhat University of Cambridge, Sarah Fakhoury Microsoft Research, Sakina Fatima University of Ottawa, Shuvendu K. Lahiri Microsoft Research, Nikhil Swamy Microsoft Research
15:15 15m Talk		Prompt-to-SQL Injections in LLM-Integrated Web Applications: Risks and DefensesSecuritySE for AI Research Track Rodrigo Resendes Pedro INESC-ID / IST, Universidade de Lisboa, Miguel E. Coimbra INESC-ID; Instituto Superior Técnico - University of Lisbon, Daniel Castro INESC-ID / IST, Universidade de Lisboa, Paulo Carreira INESC-ID / IST, Universidade de Lisboa, Nuno Santos INESC-ID; Instituto Superior Técnico - University of Lisbon

14:00 - 15:30	Analysis 3Research Track / Journal-first Papers at 212 Chair(s): Shaowei Wang University of Manitoba

14:00 15m Talk		Boosting Path-Sensitive Value Flow Analysis via Removal of Redundant Summaries Research Track Yongchao WANG Hong Kong University of Science and Technology, Yuandao Cai Hong Kong University of Science and Technology, Charles Zhang Hong Kong University of Science and Technology Pre-print
14:15 15m Talk		Dockerfile Flakiness: Characterization and Repair Research Track Taha Shabani University of British Columbia, Noor Nashid University of British Columbia, Parsa Alian University of British Columbia, Ali Mesbah University of British Columbia Pre-print
14:30 15m Talk		Evaluating Garbage Collection Performance Across Managed Language Runtimes Research Track Yicheng Wang Institute of Software Chinese Academy of Sciences, Wensheng Dou Institute of Software Chinese Academy of Sciences, Yu Liang Institute of Software Chinese Academy of Sciences, Yi Wang Institute of Software Chinese Academy of Sciences, Wei Wang Institute of Software at Chinese Academy of Sciences, Jun Wei Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Tao Huang Institute of Software Chinese Academy of Sciences
14:45 15m Talk		Module-Aware Context Sensitive Pointer Analysis Research Track Haofeng Li SKLP, Institute of Computing Technology, CAS, Chenghang Shi SKLP, Institute of Computing Technology, CAS, Jie Lu SKLP, Institute of Computing Technology, CAS, Lian Li Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Zixuan Zhao Huawei Technologies Co. Ltd File Attached
15:00 15m Talk		An Empirical Study on Reproducible Packaging in Open-Source Ecosystems Research Track Giacomo Benedetti University of Genoa, Oreofe Solarin Case Western Reserve University, Courtney Miller Carnegie Mellon University, Greg Tystahl NCSU, William Enck North Carolina State University, Christian Kästner Carnegie Mellon University, Alexandros Kapravelos NCSU, Alessio Merlo CASD - School of Advanced Defense Studies, Luca Verderame University of Genoa
15:15 15m Talk		T-Rec: Fine-Grained Language-Agnostic Program Reduction Guided by Lexical Syntax Journal-first Papers Zhenyang Xu University of Waterloo, Yongqiang Tian , Mengxiao Zhang , Jiarui Zhang University of Waterloo, Puzhuo Liu Ant Group & Tsinghua University, Yu Jiang Tsinghua University, Chengnian Sun University of Waterloo

14:00 - 15:30	AI for Program Comprehension 2Research Track at 213 Chair(s): Oscar Chaparro William & Mary

14:00 15m Talk		Code Comment Inconsistency Detection and Rectification Using a Large Language Model Research Track Guoping Rong Nanjing University, YongdaYu Nanjing University, Song Liu Nanjing University, Xin Tan Nanjing University, Tianyi Zhang Nanjing University, Haifeng Shen Southern Cross University, Jidong Hu Zhongxing Telecom Equipment
14:15 15m Talk		Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation Research Track Aaron Imani University of California, Irvine, Iftekhar Ahmed University of California at Irvine, Mohammad Moshirpour University of California, Irvine
14:30 15m Talk		HedgeCode: A Multi-Task Hedging Contrastive Learning Framework for Code Search Research Track Gong Chen Wuhan University, Xiaoyuan Xie Wuhan University, Xunzhu Tang University of Luxembourg, Qi Xin Wuhan University, Wenjie Liu Wuhan University
14:45 15m Talk		Reasoning Runtime Behavior of a Program with LLM: How Far Are We? Research Track Junkai Chen Zhejiang University, Zhiyuan Pan Zhejiang University, Xing Hu Zhejiang University, Zhenhao Li York University, Ge Li Peking University, Xin Xia Huawei
15:00 15m Talk		Source Code Summarization in the Era of Large Language Models Research Track Weisong Sun Nanjing University, Yun Miao Nanjing University, Yuekang Li UNSW, Hongyu Zhang Chongqing University, Chunrong Fang Nanjing University, Yi Liu Nanyang Technological University, Gelei Deng Nanyang Technological University, Yang Liu Nanyang Technological University, Zhenyu Chen Nanjing University Media Attached
15:15 15m Talk		Template-Guided Program Repair in the Era of Large Language Models Research Track Kai Huang , Jian Zhang Nanyang Technological University, Xiangxin Meng Beihang University, Beijing, China, Yang Liu Nanyang Technological University File Attached

14:00 - 15:30	SE for AI 3Research Track / SE in Society (SEIS) / Journal-first Papers at 215 Chair(s): Lina Marsso École Polytechnique de Montréal

14:00 15m Talk		Dissecting Global Search: A Simple yet Effective Method to Boost Individual Discrimination Testing and RepairSE for AI Research Track Lili Quan Tianjin University, Li Tianlin NTU, Xiaofei Xie Singapore Management University, Zhenpeng Chen Nanyang Technological University, Sen Chen Nankai University, Lingxiao Jiang Singapore Management University, Xiaohong Li Tianjin University Pre-print
14:15 15m Talk		FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per ViolationSE for AI Research Track Yang Sun Singapore Management University, Chris Poskitt Singapore Management University, Kun Wang Zhejiang University, Jun Sun Singapore Management University Link to publication DOI Pre-print File Attached
14:30 15m Talk		MARQ: Engineering Mission-Critical AI-based Software with Automated Result Quality AdaptationSE for AI Research Track Uwe Gropengießer Technical University of Darmstadt, Elias Dietz Technical University of Darmstadt, Florian Brandherm Technical University of Darmstadt, Achref Doula Technical University of Darmstadt, Osama Abboud Munich Research Center, Huawei, Xun Xiao Munich Research Center, Huawei, Max Mühlhäuser Technical University of Darmstadt
14:45 15m Talk		An Empirical Study of Challenges in Machine Learning Asset ManagementSE for AI Journal-first Papers Zhimin Zhao Queen's University, Yihao Chen Queen's University, Abdul Ali Bangash Software Analysis and Intelligence Lab (SAIL), Queen's University, Canada, Bram Adams Queen's University, Ahmed E. Hassan Queen’s University
15:00 15m Talk		A Reference Model for Empirically Comparing LLMs with HumansSE for AI SE in Society (SEIS) Kurt Schneider Leibniz Universität Hannover, Software Engineering Group, Farnaz Fotrousi Chalmers University of Technology and University of Gothenburg, Rebekka Wohlrab Chalmers University of Technology
15:15 7m Talk		Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-PracticeSE for AI Journal-first Papers Bentley Oakes Polytechnique Montréal, Michalis Famelis Université de Montréal, Houari Sahraoui DIRO, Université de Montréal DOI Pre-print File Attached

15:30 - 16:00	Thu Afternoon Break Posters 15:30-16:00Journal-first Papers / Research Track / New Ideas and Emerging Results (NIER) / SE in Society (SEIS) / Posters at Canada Hall 3 Poster Area

15:30 30m Talk		Mole: Efficient Crash Reproduction in Android Applications With Enforcing Necessary UI Events Journal-first Papers Maryam Masoudian Sharif University of Technology, Hong Kong University of Science and Technology (HKUST), Heqing Huang City University of Hong Kong, Morteza Amini Sharif University of Technology, Charles Zhang Hong Kong University of Science and Technology
15:30 30m Talk		Best ends by the best means: ethical concerns in app reviews Journal-first Papers Neelam Tjikhoeri Vrije Universiteit Amsterdam, Lauren Olson Vrije Universiteit Amsterdam, Emitzá Guzmán Vrije Universiteit Amsterdam
15:30 30m Talk		Shaken, Not Stirred. How Developers Like Their Amplified Tests Journal-first Papers Carolin Brandt TU Delft, Ali Khatami Delft University of Technology, Mairieli Wessel Radboud University, Andy Zaidman TU Delft Pre-print
15:30 30m Poster		BSan: A Powerful Identifier-Based Hardware-Independent Memory Error Detector for COTS Binaries Research Track Wen Zhang University of Georgia, Botang Xiao University of Georgia, Qingchen Kong University of Georgia, Le Guan University of Georgia, Wenwen Wang University of Georgia
15:30 30m Talk		Towards Early Warning and Migration of High-Risk Dormant Open-Source Software DependenciesSecurity New Ideas and Emerging Results (NIER) Zijie Huang Shanghai Key Laboratory of Computer Software Testing and Evaluation, Lizhi Cai Shanghai Key Laboratory of Computer Software Testing & Evaluating, Shanghai Software Center, Xuan Mao Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China, Kang Yang Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai Development Center of Computer Software Technology
15:30 30m Talk		Exploring User Privacy Awareness on GitHub: An Empirical Study Journal-first Papers Costanza Alfieri Università degli Studi dell'Aquila, Juri Di Rocco University of L'Aquila, Paola Inverardi Gran Sasso Science Institute, Phuong T. Nguyen University of L’Aquila
15:30 30m Poster		SimClone: Detecting Tabular Data Clones using Value Similarity Journal-first Papers Xu Yang University of Manitoba, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Dayi Lin Centre for Software Excellence, Huawei Canada, Shaowei Wang University of Manitoba, Zhen Ming (Jack) Jiang York University
15:30 30m Talk		Strategies to Embed Human Values in Mobile Apps: What do End-Users and Practitioners Think? SE in Society (SEIS) Rifat Ara Shams CSIRO's Data61, Mojtaba Shahin RMIT University, Gillian Oliver Monash University, Jon Whittle CSIRO's Data61 and Monash University, Waqar Hussain Data61, CSIRO, Harsha Perera CSIRO's Data61, Arif Nurwidyantoro Universitas Gadjah Mada

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

10:30 - 11:00	Fri Morning Break Posters 10:30-11Journal-first Papers / SE In Practice (SEIP) / Research Track / SE in Society (SEIS) / New Ideas and Emerging Results (NIER) / Posters at Canada Hall 3 Poster Area

10:30 30m Talk		An Empirical Study on Developers' Shared Conversations with ChatGPT in GitHub Pull Requests and Issues Journal-first Papers Huizi Hao Queen's University, Canada, Kazi Amit Hasan Queen's University, Canada, Hong Qin Queen's University, Marcos Macedo Queen's University, Yuan Tian Queen's University, Kingston, Ontario, Ding Steven, H., H. Queen’s University at Kingston, Ahmed E. Hassan Queen’s University
10:30 30m Talk		Automating Explanation Need Management in App Reviews: A Case Study from the Navigation App Industry SE In Practice (SEIP) Martin Obaidi Leibniz Universität Hannover, Nicolas Voß Graphmasters GmbH, Hannah Deters Leibniz University Hannover, Jakob Droste Leibniz Universität Hannover, Marc Herrmann Leibniz University Hannover, Jannik Fischbach Netlight Consulting GmbH and fortiss GmbH, Kurt Schneider Leibniz Universität Hannover, Software Engineering Group
10:30 30m Talk		On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools.Security Journal-first Papers Aurora Papotti Vrije Universiteit Amsterdam, Ranindya Paramitha University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
10:30 30m Talk		Relevant information in TDD experiment reporting Journal-first Papers Fernando Uyaguari Instituto Superior Tecnológico Wissen, Silvia Teresita Acuña Castillo Universidad Autónoma de Madrid, John W. Castro Universidad de Atacama, Davide Fucci Blekinge Institute of Technology, Oscar Dieste Universidad Politécnica de Madrid, Sira Vegas Universidad Politecnica de Madrid
10:30 30m Talk		BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks Research Track Yisong Xiao Beihang University, Aishan Liu Beihang University; Institute of Dataspace, Xinwei Zhang Beihang University, Tianyuan Zhang Beihang University, Li Tianlin NTU, Siyuan Liang National University of Singapore, Xianglong Liu Beihang University; Institute of Dataspace; Zhongguancun Laboratory, Yang Liu Nanyang Technological University, Dacheng Tao Nanyang Technological University
10:30 30m Talk		Ethical Issues in Video Games: Insights from Reddit Discussions SE in Society (SEIS) Yeqian Li Vrije Universiteit Amsterdam, Kousar Aslam Vrije Universiteit Amsterdam
10:30 30m Talk		SusDevOps: Promoting Sustainability to a First Principle in Software Delivery New Ideas and Emerging Results (NIER) Istvan David McMaster University / McMaster Centre for Software Certification (McSCert)

11:00 - 12:30	Program Comprehension 3Research Track / Journal-first Papers at 204 Chair(s): Arie van Deursen TU Delft

11:00 15m Talk		Automated Test Generation For Smart Contracts via On-Chain Test Case Augmentation and MigrationBlockchain Research Track Jiashuo Zhang Peking University, China, Jiachi Chen Sun Yat-sen University, John Grundy Monash University, Jianbo Gao Peking University, Yanlin Wang Sun Yat-sen University, Ting Chen University of Electronic Science and Technology of China, Zhi Guan Peking University, Zhong Chen Pre-print
11:15 15m Talk		Boosting Code-line-level Defect Prediction with Spectrum Information and Causality Analysis Research Track Shiyu Sun , Yanhui Li Nanjing University, Lin Chen Nanjing University, Yuming Zhou Nanjing University, Jianhua Zhao Nanjing University, China
11:30 15m Talk		BatFix: Repairing language model-based transpilation Journal-first Papers Daniel Ramos Carnegie Mellon University, Ines Lynce INESC-ID/IST, Universidade de Lisboa, Vasco Manquinho INESC-ID; Universidade de Lisboa, Ruben Martins Carnegie Mellon University, Claire Le Goues Carnegie Mellon University
11:45 15m Talk		Tracking the Evolution of Static Code Warnings: The State-of-the-Art and a Better Approach Journal-first Papers Junjie Li , Jinqiu Yang Concordia University
12:00 15m Talk		PACE: A Program Analysis Framework for Continuous Performance Prediction Journal-first Papers Chidera Biringa University of Massachusetts, Gokhan Kul University of Massachusetts Dartmouth
12:15 15m Talk		Mimicking Production Behavior With Generated Mocks Journal-first Papers Deepika Tiwari KTH Royal Institute of Technology, Martin Monperrus KTH Royal Institute of Technology, Benoit Baudry Université de Montréal

11:00 - 12:30	Testing and QA 4Research Track at 205 Chair(s): Matteo Camilli Politecnico di Milano

11:00 15m Talk		DPFuzzer: Discovering Safety Critical Vulnerabilities for Drone Path PlannersSecurity Research Track Yue Wang , Chao Yang Xidian University, Xiaodong Zhang , Yuwanqi Deng Xidian University, Jianfeng Ma Xidian University
11:15 15m Talk		IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation Research Track Yuyang Rong University of California, Davis, Zhanghan Yu University of California, Davis, Zhenkai Weng University of California, Davis, Stephen Neuendorffer Advanced Micro Devices, Inc., Hao Chen University of California at Davis
11:30 15m Talk		Ranking Relevant Tests for Order-Dependent Flaky Tests Research Track Shanto Rahman The University of Texas at Austin, Bala Naren Chanumolu George Mason University, Suzzana Rafi George Mason University, August Shi The University of Texas at Austin, Wing Lam George Mason University
11:45 15m Talk		Selecting Initial Seeds for Better JVM Fuzzing Research Track Tianchang Gao Tianjin University, Junjie Chen Tianjin University, Dong Wang Tianjin University, Yile Guo College of Intelligence and Computing, Tianjin University, Yingquan Zhao Tianjin University, Zan Wang Tianjin University
12:00 15m Talk		Toward a Better Understanding of Probabilistic Delta Debugging Research Track Mengxiao Zhang , Zhenyang Xu University of Waterloo, Yongqiang Tian , Xinru Cheng University of Waterloo, Chengnian Sun University of Waterloo
12:15 15m Talk		Tumbling Down the Rabbit Hole: How do Assisting Exploration Strategies Facilitate Grey-box Fuzzing?Award Winner Research Track Mingyuan Wu Southern University of Science and Technology, Jiahong Xiang Southern University of Science and Technology, Kunqiu Chen Southern University of Science and Technology, Peng Di Ant Group & UNSW Sydney, Shin Hwei Tan Concordia University, Heming Cui University of Hong Kong, Yuqun Zhang Southern University of Science and Technology

11:00 - 12:30	Human and Social 3SE In Practice (SEIP) / Journal-first Papers / Research Track / New Ideas and Emerging Results (NIER) at 206 plus 208 Chair(s): Yuan Tian Queen's University, Kingston, Ontario

11:00 15m Talk		Relationship Status: “It’s complicated” Developer-Security Expert Dynamics in ScrumSecurity Research Track Houda Naji Ruhr University Bochum, Marco Gutfleisch Ruhr University Bochum, Alena Naiakshina Ruhr University Bochum
11:15 15m Talk		Soft Skills in Software Engineering: Insights from the Trenches SE In Practice (SEIP) Sanna Malinen University of Canterbury, Matthias Galster University of Canterbury, Antonija Mitrovic University of Canterbury, New Zealand, Sreedevi Sankara Iyer University of Canterbury, Pasan Peiris University of Canterbury, New Zealand, April Clarke University of Canterbury
11:30 15m Talk		A Unified Browser-Based Consent Management Framework New Ideas and Emerging Results (NIER) Gayatri Priyadarsini Indian Institute of Technology Gandhinagar, Abhishek Bichhawat Indian Institute of Technology Gandhinagar
11:45 15m Talk		Predicting Attrition among Software Professionals: Antecedents and Consequences of Burnout and Engagement Journal-first Papers Bianca Trinkenreich Colorado State University, Fabio Marcos De Abreu Santos Colorado State University, USA, Klaas-Jan Stol Lero; University College Cork; SINTEF Digital
12:00 7m Talk		A Controlled Experiment in Age and Gender Bias When Reading Technical Articles in Software Engineering Journal-first Papers Anda Liang Vanderbilt University, Emerson Murphy-Hill Microsoft, Westley Weimer University of Michigan, Yu Huang Vanderbilt University
12:07 7m Talk		Best ends by the best means: ethical concerns in app reviews Journal-first Papers Neelam Tjikhoeri Vrije Universiteit Amsterdam, Lauren Olson Vrije Universiteit Amsterdam, Emitzá Guzmán Vrije Universiteit Amsterdam
12:14 7m Talk		Shaken, Not Stirred. How Developers Like Their Amplified Tests Journal-first Papers Carolin Brandt TU Delft, Ali Khatami Delft University of Technology, Mairieli Wessel Radboud University, Andy Zaidman TU Delft Pre-print
12:21 7m Talk		Exploring User Privacy Awareness on GitHub: An Empirical Study Journal-first Papers Costanza Alfieri Università degli Studi dell'Aquila, Juri Di Rocco University of L'Aquila, Paola Inverardi Gran Sasso Science Institute, Phuong T. Nguyen University of L’Aquila

11:00 - 12:30	Human and Social using AI 2Research Track / SE In Practice (SEIP) / Demonstrations at 207 Chair(s): Sebastian Baltes University of Bayreuth

11:00 15m Talk		Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models SE In Practice (SEIP) Hao Li Queen's University, Cor-Paul Bezemer University of Alberta, Ahmed E. Hassan Queen’s University Pre-print
11:15 15m Talk		FairLay-ML: Intuitive Debugging of Fairness in Data-Driven Social-Critical Software Demonstrations Normen Yu Penn State, Luciana Carreon University of Texas at El Paso, Gang (Gary) Tan Pennsylvania State University, Saeid Tizpaz-Niari University of Illinois Chicago
11:30 15m Talk		Dear Diary: A randomized controlled trial of Generative AI coding tools in the workplace SE In Practice (SEIP) Jenna L. Butler Microsoft Research, Jina Suh Microsoft Research, Sankeerti Haniyur Microsoft Corporation, Constance Hadley Institute for Work Life
11:45 15m Talk		Exploring GenAI in Software Development: Insights from a Case Study in a Large Brazilian Company SE In Practice (SEIP) Guilherme Vaz Pereira School of Technology, PUCRS, Brazil, Victoria Jackson University of California, Irvine, Rafael Prikladnicki School of Technology at PUCRS University, Andre van der Hoek University of California, Irvine, Luciane Fortes Globo, Carolina Araújo Globo, André Coelho Globo, Ligia Chelli Globo, Diego Ramos Globo Pre-print
12:00 15m Talk		Human-In-the-Loop Software Development Agents SE In Practice (SEIP) Wannita Takerngsaksiri Monash University, Jirat Pasuksmit Atlassian, Patanamon Thongtanunam University of Melbourne, Kla Tantithamthavorn Monash University, Ruixiong Zhang Atlassian, Fan Jiang Atlassian, Jing Li Atlassian, Evan Cook Atlassian, Kun Chen Atlassian, Ming Wu Atlassian
12:15 15m Talk		Measuring the Runtime Performance of C++ Code Written by Humans using GitHub Copilot Research Track Daniel Erhabor University of Waterloo, Sreeharsha Udayashankar University of Waterloo, Mei Nagappan University of Waterloo, Samer Al-Kiswany University of Waterloo DOI Pre-print File Attached

11:00 - 12:30	Security and Analysis 2Research Track at 210 Chair(s): Jordan Samhi University of Luxembourg, Luxembourg

11:00 15m Talk		A Study of Undefined Behavior Across Foreign Function Boundaries in Rust LibrariesSecurity Research Track Ian McCormack Carnegie Mellon University, Joshua Sunshine Carnegie Mellon University, Jonathan Aldrich Carnegie Mellon University Pre-print
11:15 15m Talk		Cooperative Software Verification via Dynamic Program SplittingSecurity Research Track Cedric Richter University of Oldenburg, Marek Chalupa Institute of Science and Technology Austria, Marie-Christine Jakobs LMU Munich, Germany, Heike Wehrheim University of Oldenburg
11:30 15m Talk		Exposing the Hidden Layer: Software Repositories in the Service of SEO ManipulationSecurity Research Track Mengying Wu Fudan University, Geng Hong Fudan University, Wuyuao Mai Fudan University, Xinyi Wu Fudan University, Lei Zhang Fudan University, Yingyuan Pu QI-ANXIN Technology Research Institute, Huajun Chai QI-ANXIN Technology Research Institute, Lingyun Ying Qi An Xin Group Corp., Haixin Duan Institute for Network Science and Cyberspace, Tsinghua University; Qi An Xin Group Corp., Min Yang Fudan University
11:45 15m Talk		Hetrify: Efficient Verification of Heterogeneous Programs on RISC-VSecurityAward Winner Research Track Yiwei Li School of Computer, National Univer sity of Defense Technology, Liangze Yin School of Computer, National Univer sity of Defense Technology, Wei Dong National University of Defense Technology, Jiaxin Liu National University of Defense Technology, Yanfeng Hu School of Computer, National Univer sity of Defense Technology, Shanshan Li National University of Defense Technology
12:00 15m Talk		Hyperion: Unveiling DApp Inconsistencies using LLM and Dataflow-Guided Symbolic ExecutionSecurity Research Track Shuo Yang Sun Yat-sen University, Xingwei Lin Ant Group, Jiachi Chen Sun Yat-sen University, Qingyuan Zhong Sun Yat-sen University, Lei Xiao Sun Yat-sen University, renke huang Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Zibin Zheng Sun Yat-sen University
12:15 15m Talk		SmartReco: Detecting Read-Only Reentrancy via Fine-Grained Cross-DApp AnalysisSecurity Research Track Jingwen Zhang School of Software Engineering, Sun Yat sen University, Zibin Zheng Sun Yat-sen University, Yuhong Nan Sun Yat-sen University, Mingxi Ye Sun Yat-sen University, Kaiwen Ning Sun Yat-sen University, Yu Zhang Harbin Institute of Technology, Weizhe Zhang Harbin Institute of Technology

11:00 - 12:30	Design and Architecture 1Research Track / SE In Practice (SEIP) / Journal-first Papers at 211 Chair(s): Tushar Sharma Dalhousie University

11:00 15m Talk		A Catalog of Micro Frontends Anti-patterns Research Track Nabson Silva UFAM - Federal University of Amazonas, Eriky Rodrigues UFAM - Federal University of Amazonas Brazil, Tayana Conte Universidade Federal do Amazonas
11:15 15m Talk		PairSmell: A Novel Perspective Inspecting Software Modular StructureAward Winner Research Track Chenxing Zhong Nanjing University, Daniel Feitosa University of Groningen, Paris Avgeriou Univ. of Gronningen , Huang Huang State Grid Nanjing Power Supply Company, Yue Li Nanjing University, He Zhang Nanjing University Pre-print
11:30 15m Talk		Understanding Architectural Complexity, Maintenance Burden, and Developer Sentiment---a Large-Scale Study Research Track Yuanfang Cai Drexel University, Lanting He Google, Yony Kochinski Google, Jun Qian Google, Ciera Jaspan Google, Nan Zhang Google, Antonio Bianco Google
11:45 15m Talk		A Large-Scale Exploratory Study on the Proxy Pattern in EthereumBlockchain Journal-first Papers Amir Ebrahimi Queen's University, Bram Adams Queen's University, Gustavo A. Oliva Queen's University, Ahmed E. Hassan Queen’s University
12:00 15m Talk		Video Game Procedural Content Generation Through Software Transplantation SE In Practice (SEIP) Mar Zamorano López University College London, Daniel Blasco SVIT Research Group. Universidad San Jorge, Carlos Cetina , Federica Sarro University College London

11:00 - 12:30	AI for Analysis 4Research Track / New Ideas and Emerging Results (NIER) / SE In Practice (SEIP) at 212 Chair(s): Maliheh Izadi Delft University of Technology, Ali Al-Kaswan Delft University of Technology, Netherlands, Jonathan Katzy Delft University of Technology

11:00 15m Talk		RepairAgent: An Autonomous, LLM-Based Agent for Program Repair Research Track Islem BOUZENIA University of Stuttgart, Prem Devanbu University of California at Davis, Michael Pradel University of Stuttgart Pre-print
11:15 15m Talk		Evaluating Agent-based Program Repair at Google SE In Practice (SEIP) Patrick Rondon Google, Renyao Wei Google, José Pablo Cambronero Google, USA, Jürgen Cito TU Wien, Aaron Sun Google, Siddhant Sanyam Google, Michele Tufano Google, Satish Chandra Google, Inc
11:30 15m Talk		Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset SE In Practice (SEIP) Mohammad Saiful Islam Toronto Metropolitan University, Toronto, Canada, Mohamed Sami Rakha Toronto Metropolitan University, Toronto, Canada, William Pourmajidi Toronto Metropolitan University, Toronto, Canada, Janakan Sivaloganathan Toronto Metropolitan University, Toronto, Canada, John Steinbacher IBM, Andriy Miranskyy Toronto Metropolitan University (formerly Ryerson University) Pre-print
11:45 15m Talk		Crash Report Prioritization for Large-Scale Scheduled Launches SE In Practice (SEIP) Nimmi Rashinika Weeraddana University of Waterloo, Sarra Habchi Ubisoft Montréal, Shane McIntosh University of Waterloo
12:00 15m Talk		LogLM: From Task-based to Instruction-based Automated Log Analysis SE In Practice (SEIP) Yilun Liu Huawei co. LTD, Yuhe Ji Huawei co. LTD, Shimin Tao University of Science and Technology of China; Huawei co. LTD, Minggui He Huawei co. LTD, Weibin Meng Huawei co. LTD, Shenglin Zhang Nankai University, Yongqian Sun Nankai University, Yuming Xie Huawei co. LTD, Boxing Chen Huawei Canada, Hao Yang Huawei co. LTD Pre-print
12:15 7m Talk		Using ML filters to help automated vulnerability repairs: when it helps and when it doesn’tSecurity New Ideas and Emerging Results (NIER) Maria Camporese University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam Pre-print

11:00 - 12:30	AI for Testing and QA 5SE In Practice (SEIP) / Research Track at 214 Chair(s): Chunyang Chen TU Munich

11:00 15m Talk		ASTER: Natural and Multi-language Unit Test Generation with LLMsAward Winner SE In Practice (SEIP) Rangeet Pan IBM Research, Myeongsoo Kim Georgia Institute of Technology, Rahul Krishna IBM Research, Raju Pavuluri IBM T.J. Watson Research Center, Saurabh Sinha IBM Research Pre-print
11:15 15m Talk		Automated Code Review In Practice SE In Practice (SEIP) Umut Cihan Bilkent University, Vahid Haratian Bilkent Univeristy, Arda İçöz Bilkent University, Mert Kaan Gül Beko, Ömercan Devran Beko, Emircan Furkan Bayendur Beko, Baykal Mehmet Ucar Beko, Eray Tüzün Bilkent University Pre-print
11:30 15m Talk		CI at Scale: Lean, Green, and Fast SE In Practice (SEIP) Dhruva Juloori Uber Technologies, Inc, Zhongpeng Lin Uber Technologies Inc., Matthew Williams Uber Technologies, Inc, Eddy Shin Uber Technologies, Inc, Sonal Mahajan Uber Technologies Inc.
11:45 15m Talk		Moving Faster and Reducing Risk: Using LLMs in Release DeploymentAward Winner SE In Practice (SEIP) Rui Abreu Meta, Vijayaraghavan Murali Meta Platforms Inc., Peter C Rigby Meta / Concordia University, Chandra Sekhar Maddila Meta Platforms, Inc., Weiyan Sun Meta Platforms, Inc., Jun Ge Meta Platforms, Inc., Kaavya Chinniah Meta Platforms, Inc., Audris Mockus University of Tennessee, Megh Mehta Meta Platforms, Inc., Nachiappan Nagappan Meta Platforms, Inc.
12:00 15m Talk		Prioritizing Large-scale Natural Language Test Cases at OPPO SE In Practice (SEIP) Haoran Xu , Chen Zhi Zhejiang University, Tianyu Xiang Guangdong Oppo Mobile Telecommunications Corp., Ltd., Zixuan Wu Zhejiang University, Gaorong Zhang Zhejiang University, Xinkui Zhao Zhejiang University, Jianwei Yin Zhejiang University, Shuiguang Deng Zhejiang University; Alibaba-Zhejiang University Joint Institute of Frontier Technologies
12:15 15m Talk		Search+LLM-based Testing for ARM Simulators SE In Practice (SEIP) Bobby Bruce University of California at Davis, USA, Aidan Dakhama King's College London, Karine Even-Mendoza King’s College London, William B. Langdon University College London, Hector Menendez King’s College London, Justyna Petke University College London

11:00 - 12:30	SE for AI with Quality 1Research Track at 215 Chair(s): Chris Poskitt Singapore Management University

11:00 15m Talk		A Tale of Two DL Cities: When Library Tests Meet CompilerSE for AI Research Track Qingchao Shen Tianjin University, Yongqiang Tian , Haoyang Ma Hong Kong University of Science and Technology, Junjie Chen Tianjin University, Lili Huang College of Intelligence and Computing, Tianjin University, Ruifeng Fu Tianjin University, Shing-Chi Cheung Hong Kong University of Science and Technology, Zan Wang Tianjin University
11:15 15m Talk		Iterative Generation of Adversarial Example for Deep Code ModelsSE for AIAward Winner Research Track Li Huang , Weifeng Sun , Meng Yan Chongqing University
11:30 15m Talk		On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning ImplementationsSE for AI Research Track Rajdeep Singh Hundal National University of Singapore, Yan Xiao Sun Yat-sen University, Xiaochun Cao Sun Yat-Sen University, Jin Song Dong National University of Singapore, Manuel Rigger National University of Singapore Pre-print Media Attached File Attached
11:45 15m Talk		µPRL: a Mutation Testing Pipeline for Deep Reinforcement Learning based on Real FaultsSE for AI Research Track Deepak-George Thomas Tulane University, Matteo Biagiola Università della Svizzera italiana, Nargiz Humbatova Università della Svizzera italiana, Mohammad Wardat Oakland University, USA, Gunel Jahangirova King's College London, Hridesh Rajan Tulane University, Paolo Tonella USI Lugano Pre-print
12:00 15m Talk		Testing and Understanding Deviation Behaviors in FHE-hardened Machine Learning ModelsSE for AI Research Track Yiteng Peng Hong Kong University of Science and Technology, Daoyuan Wu Hong Kong University of Science and Technology, Zhibo Liu Hong Kong University of Science and Technology, Dongwei Xiao Hong Kong University of Science and Technology, Zhenlan Ji The Hong Kong University of Science and Technology, Juergen Rahmel HSBC, Shuai Wang Hong Kong University of Science and Technology
12:15 15m Talk		TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron ProvenanceSE for AI Research Track Waris Gill Virginia Tech, Ali Anwar University of Minnesota, Muhammad Ali Gulzar Virginia Tech Pre-print

11:00 - 12:30	AI for SE 3New Ideas and Emerging Results (NIER) / Journal-first Papers / Research Track / SE In Practice (SEIP) at Canada Hall 1 and 2 Chair(s): Ying Zou Queen's University, Kingston, Ontario

11:00 15m Talk		A First Look at Conventional Commits Classification Research Track Qunhong Zeng Beijing Institute of Technology, Yuxia Zhang Beijing Institute of Technology, Zhiqing Qiu Beijing Institute of Technology, Hui Liu Beijing Institute of Technology
11:15 15m Talk		ChatGPT-Based Test Generation for Refactoring Engines Enhanced by Feature Analysis on Examples Research Track Chunhao Dong Beijing Institute of Technology, Yanjie Jiang Peking University, Yuxia Zhang Beijing Institute of Technology, Yang Zhang Hebei University of Science and Technology, Hui Liu Beijing Institute of Technology
11:30 15m Talk		SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing Research Track Wenchao Gu The Chinese University of Hong Kong, Ensheng Shi Xi’an Jiaotong University, Yanlin Wang Sun Yat-sen University, Lun Du Microsoft Research, Shi Han Microsoft Research, Hongyu Zhang Chongqing University, Dongmei Zhang Microsoft Research, Michael Lyu The Chinese University of Hong Kong
11:45 15m Talk		UniGenCoder: Merging Seq2Seq and Seq2Tree Paradigms for Unified Code Generation New Ideas and Emerging Results (NIER) Liangying Shao School of Informatics, Xiamen University, China, Yanfu Yan William & Mary, Denys Poshyvanyk William & Mary, Jinsong Su School of Informatics, Xiamen University, China
12:00 15m Talk		How is Google using AI for internal code migrations? SE In Practice (SEIP) Stoyan Nikolov Google, Inc., Daniele Codecasa Google, Inc., Anna Sjovall Google, Inc., Maxim Tabachnyk Google, Siddharth Taneja Google, Inc., Celal Ziftci Google, Satish Chandra Google, Inc
12:15 7m Talk		LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation Journal-first Papers Sarah Fakhoury Microsoft Research, Aaditya Naik University of Pennsylvania, Georgios Sakkas University of California at San Diego, Saikat Chakraborty Microsoft Research, Shuvendu K. Lahiri Microsoft Research Link to publication
12:22 7m Talk		The impact of Concept drift and Data leakage on Log Level Prediction Models Journal-first Papers Youssef Esseddiq Ouatiti Queen's university, Mohammed Sayagh ETS Montreal, University of Quebec, Noureddine Kerzazi Ensias-Rabat, Bram Adams Queen's University, Ahmed E. Hassan Queen’s University, Youssef Esseddiq Ouatiti Queen's university

13:00 - 13:30	Fri Lunch Posters 13:00-13:30SE in Society (SEIS) / Journal-first Papers / Demonstrations / Research Track / New Ideas and Emerging Results (NIER) / Posters at Canada Hall 3 Poster Area

13:00 30m Talk		Strategies to Embed Human Values in Mobile Apps: What do End-Users and Practitioners Think? SE in Society (SEIS) Rifat Ara Shams CSIRO's Data61, Mojtaba Shahin RMIT University, Gillian Oliver Monash University, Jon Whittle CSIRO's Data61 and Monash University, Waqar Hussain Data61, CSIRO, Harsha Perera CSIRO's Data61, Arif Nurwidyantoro Universitas Gadjah Mada
13:00 30m Talk		Best ends by the best means: ethical concerns in app reviews Journal-first Papers Neelam Tjikhoeri Vrije Universiteit Amsterdam, Lauren Olson Vrije Universiteit Amsterdam, Emitzá Guzmán Vrije Universiteit Amsterdam
13:00 30m Poster		HyperCRX 2.0: A Comprehensive and Automated Tool for Empowering GitHub Insights Demonstrations Yantong Wang East China Normal University, Shengyu Zhao Tongji University, will wang , Fenglin Bi East China Normal University
13:00 30m Poster		Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models Research Track Kunpeng Zhang The Hong Kong University of Science and Technology, Shuai Wang Hong Kong University of Science and Technology, Jitao Han Central University of Finance and Economics, Xiaogang Zhu The University of Adelaide, Xian Li Swinburne University of Technology, Shaohua Wang Central University of Finance and Economics, Sheng Wen Swinburne University of Technology
13:00 30m Talk		Using ML filters to help automated vulnerability repairs: when it helps and when it doesn’tSecurity New Ideas and Emerging Results (NIER) Maria Camporese University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam Pre-print
13:00 30m Talk		Shaken, Not Stirred. How Developers Like Their Amplified Tests Journal-first Papers Carolin Brandt TU Delft, Ali Khatami Delft University of Technology, Mairieli Wessel Radboud University, Andy Zaidman TU Delft Pre-print
13:00 30m Talk		Exploring User Privacy Awareness on GitHub: An Empirical Study Journal-first Papers Costanza Alfieri Università degli Studi dell'Aquila, Juri Di Rocco University of L'Aquila, Paola Inverardi Gran Sasso Science Institute, Phuong T. Nguyen University of L’Aquila

14:00 - 15:30	Real-Time SESE In Practice (SEIP) / Demonstrations / Journal-first Papers / New Ideas and Emerging Results (NIER) / Research Track at 203 Chair(s): Domenico Bianculli University of Luxembourg

14:00 15m Talk		Closing the Gap between Sensor Inputs and Driving Properties: A Scene Graph Generator for CARLA Demonstrations Trey Woodlief University of Virginia, Felipe Toledo , Sebastian Elbaum University of Virginia, Matthew B Dwyer University of Virginia
14:15 15m Talk		LEGOS-SLEEC: Tool for Formalizing and Analyzing Normative Requirements Demonstrations Kevin Kolyakov University of Toronto, Lina Marsso École Polytechnique de Montréal, Nick Feng University of Toronto, Junwei Quan University of Toronto, Marsha Chechik University of Toronto
14:30 15m Talk		MarMot: Metamorphic Runtime Monitoring of Autonomous Driving Systems Journal-first Papers Jon Ayerdi Mondragon University, Asier Iriarte Mondragon University, Pablo Valle Mondragon University, Ibai Roman Mondragon University, Miren Illarramendi Mondragon University, Aitor Arrieta Mondragon University
14:45 15m Talk		Automatically Generating Content for Testing Autonomous Vehicles from User Descriptions New Ideas and Emerging Results (NIER) Benedikt Steininger IMC FH Krems, Chrysanthi Papamichail BeamNG GmbH, David Stark BeamNG GmbH, Dejan Nickovic Austrian Institute of Technology, Alessio Gambi Austrian Institute of Technology (AIT)
15:00 15m Talk		BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems SE In Practice (SEIP) Tao Duan Xi'an Jiaotong University, Runqing Chen Alibaba, Pinghui Wang Xi'an Jiaotong University, Junzhou Zhao Xi'an Jiaotong University, Jiongzhou Liu Alibaba, Shujie Han Northwestern Polytechnical University, Yi Liu Alibaba, Fan Xu Alibaba
15:15 15m Talk		On Large Language Models in Mission-Critical IT Governance: Are We Ready Yet? SE In Practice (SEIP) Matteo Esposito University of Oulu, Francesco Palagiano Multitel di Lerede Alessandro & C. s.a.s., Valentina Lenarduzzi University of Oulu, Davide Taibi University of Oulu DOI Pre-print

14:00 - 15:30	Program Comprehension 4Research Track at 204 Chair(s): Simone Scalabrino University of Molise

14:00 15m Talk		Decoding the Issue Resolution Process In Practice via Issue Report Analysis: A Case Study of Firefox Research Track Antu Saha William & Mary, Oscar Chaparro William & Mary Pre-print
14:15 15m Talk		Preserving Privacy in Software Composition Analysis: A Study of Technical Solutions and Enhancements Research Track Huaijin Wang Ohio State University, Zhibo Liu Hong Kong University of Science and Technology, Yanbo Dai The Hong Kong University of Science and Technology (Guangzhou), Shuai Wang Hong Kong University of Science and Technology, Qiyi Tang Tencent Security Keen Lab, Sen Nie Tencent Security Keen Lab, Shi Wu Tencent Security Keen Lab
14:30 15m Talk		UML is Back. Or is it? Investigating the Past, Present, and Future of UML in Open Source Software Research Track Joseph Romeo Software Institute - USI, Lugano, Switzerland, Marco Raglianti Software Institute - USI, Lugano, Csaba Nagy , Michele Lanza Software Institute - USI, Lugano Pre-print
14:45 15m Talk		Understanding the Response to Open-Source Dependency Abandonment in the npm EcosystemAward Winner Research Track Courtney Miller Carnegie Mellon University, Mahmoud Jahanshahi University of Tennessee, Audris Mockus University of Tennessee, Bogdan Vasilescu Raj Reddy Associate Professor of Software and Societal Systems, Carnegie Mellon University, USA, Christian Kästner Carnegie Mellon University
15:00 15m Talk		Understanding Compiler Bugs in Real Development Research Track Hao Zhong Shanghai Jiao Tong University
15:15 15m Talk		Studying Programmers Without Programming: Investigating Expertise Using Resting State fMRI Research Track Zachary Karas Vanderbilt University, Benjamin Gold Vanderbilt University, Violet Zhou University of Michigan, Noah Reardon University of Michigan, Thad Polk University of Michigan, Catie Chang Vanderbilt University, Yu Huang Vanderbilt University

14:00 - 15:30	Testing and QA 5Research Track / Journal-first Papers / New Ideas and Emerging Results (NIER) / Demonstrations at 205 Chair(s): Giovanni Denaro University of Milano - Bicocca

14:00 15m Talk		Leveraging Propagated Infection to Crossfire Mutants Research Track Hang Du University of California at Irvine, Vijay Krishna Palepu Microsoft, James Jones University of California at Irvine File Attached
14:15 15m Talk		IFSE: Taming Closed-box Functions in Symbolic Execution via Fuzz Solving Demonstrations Qichang Wang East China Normal University, Chuyang Chen The Ohio State University, Ruiyang Xu East China Normal University, Haiying Sun East China Normal University, Chengcheng Wan East China Normal University, Ting Su East China Normal University, Yueling Zhang East China Normal University, Geguang Pu East China Normal University, China
14:30 15m Talk		Takuan: Using Dynamic Invariants To Debug Order-Dependent Flaky Tests New Ideas and Emerging Results (NIER) Nate Levin Yorktown High School, Chengpeng Li University of Texas at Austin, Yule Zhang George Mason University, August Shi The University of Texas at Austin, Wing Lam George Mason University
14:45 15m Talk		Vision Transformer Inspired Automated Vulnerability RepairSecurity Journal-first Papers Michael Fu The University of Melbourne, Van Nguyen Monash University, Kla Tantithamthavorn Monash University, Dinh Phung Monash University, Australia, Trung Le Monash University, Australia
15:00 15m Talk		ZigZagFuzz: Interleaved Fuzzing of Program Options and Files Journal-first Papers Ahcheong Lee KAIST, Youngseok Choi KAIST, Shin Hong Chungbuk National University, Yunho Kim Hanyang University, Kyutae Cho LIG Nex1 AI R&D, Moonzoo Kim KAIST / VPlusLab Inc.
15:15 15m Talk		Reducing the Length of Field-replay Based Load Testing Journal-first Papers Yuanjie Xia University of Waterloo, Lizhi Liao Memorial University of Newfoundland, Jinfu Chen Wuhan University, Heng Li Polytechnique Montréal, Weiyi Shang University of Waterloo

14:00 - 15:30	Human and Social 4Journal-first Papers / SE in Society (SEIS) / SE In Practice (SEIP) / Research Track at 206 plus 208 Chair(s): Liliana Pasquale University College Dublin & Lero

14:00 15m Talk		Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products SE In Practice (SEIP) Nadia Nahar Carnegie Mellon University, Christian Kästner Carnegie Mellon University, Jenna L. Butler Microsoft Research, Chris Parnin Microsoft, Thomas Zimmermann University of California, Irvine, Christian Bird Microsoft Research
14:15 15m Talk		Follow-Up Attention: An Empirical Study of Developer and Neural Model Code Exploration Journal-first Papers Matteo Paltenghi University of Stuttgart, Rahul Pandita GitHub, Inc., Austin Henley Carnegie Mellon University, Albert Ziegler XBow
14:30 15m Talk		Do Developers Adopt Green Architectural Tactics for ML-Enabled Systems? A Mining Software Repository Study SE in Society (SEIS) Vincenzo De Martino University of Salerno, Silverio Martínez-Fernández UPC-BarcelonaTech, Fabio Palomba University of Salerno Pre-print
14:45 15m Talk		Accessibility Issues in Ad-Driven Web Applications Research Track Abdul Haddi Amjad Virginia Tech, Muhammad Danish Virginia Tech, Bless Jah Virginia Tech, Muhammad Ali Gulzar Virginia Tech
15:00 15m Talk		A Bot-based Approach to Manage Codes of Conduct in Open-Source Projects SE in Society (SEIS) Sergio Cobos IN3 - UOC, Javier Luis Cánovas Izquierdo Universitat Oberta de Catalunya Pre-print
15:15 7m Talk		Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding WeaknessesSecurity Journal-first Papers Wachiraphan (Ping) Charoenwet University of Melbourne, Patanamon Thongtanunam University of Melbourne, Thuan Pham University of Melbourne, Christoph Treude Singapore Management University

14:00 - 15:30	User ExperienceJournal-first Papers / Research Track / SE In Practice (SEIP) / SE in Society (SEIS) at 207 Chair(s): Ramiro Liscano Ontario Tech University

14:00 15m Talk		A Tale of Two Comprehensions? Analyzing Student Programmer Attention During Code Summarization Journal-first Papers Zachary Karas Vanderbilt University, Aakash Bansal University of Notre Dame, Yifan Zhang Vanderbilt University, Toby Jia-Jun Li University of Notre Dame, Collin McMillan University of Notre Dame, Yu Huang Vanderbilt University
14:15 15m Talk		Asking and Answering Questions During Memory Profiling Journal-first Papers Alison Fernandez Blanco University of Chile, Araceli Queirolo Cordova ISCLab, Department of Computer Science (DCC), University of Chile, Alexandre Bergel University of Chile, Juan Pablo Sandoval Alcocer Pontificia Universidad Católica de Chile
14:30 15m Talk		Unveiling the Energy Vampires: A Methodology for Debugging Software Energy ConsumptionAward Winner Research Track Enrique Barba Roque TU Delft, Luís Cruz TU Delft, Thomas Durieux TU Delft Pre-print
14:45 15m Talk		Designing a Tool for Evacuation Plan Validation: Multi-Agent Simulations with Persona-Based UI SE in Society (SEIS) Gennaro Zanfardino University of L'Aquila, Antinisca Di Marco University of L'Aquila, Michele Tucci University of L'Aquila
15:00 15m Talk		Testing False Recalls in E-commerce Apps: a User-perspective Blackbox Approach SE In Practice (SEIP) Shengnan Wu School of Computer Science, Fudan University, Yongxiang Hu Fudan University, Jiazhen Gu Fudan University, China, Penglei Mao School of Computer Science, Fudan University, Jin Meng Meituan Inc., Liujie Fan Meituan Inc., Zhongshi Luan Meituan Inc., Xin Wang Fudan University, Yangfan Zhou Fudan University
15:15 7m Talk		On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools.Security Journal-first Papers Aurora Papotti Vrije Universiteit Amsterdam, Ranindya Paramitha University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
15:22 7m Talk		On Effectiveness and Efficiency of Gamified Exploratory GUI Testing Journal-first Papers Riccardo Coppola Politecnico di Torino, Tommaso Fulcini Politecnico di Torino, Luca Ardito Politecnico di Torino, Marco Torchiano Politecnico di Torino, Emil Alégroth Blekinge Institute of Technology

14:00 - 15:30	Security and Analysis 3Research Track / SE In Practice (SEIP) at 210 Chair(s): Adriana Sejfia University of Edinburgh

14:00 15m Talk		Automated, Unsupervised, and Auto-parameterized Inference of Data Patterns and Anomaly DetectionSecurity Research Track Qiaolin Qin Polytechnique Montréal, Heng Li Polytechnique Montréal, Ettore Merlo Polytechnique Montreal, Maxime Lamothe Polytechnique Montreal Pre-print
14:15 15m Talk		On Prescription or Off Prescription? An Empirical Study of Community-prescribed Security Configurations for KubernetesSecurity Research Track Shazibul Islam Shamim Auburn University, Hanyang Hu Company A, Akond Rahman Auburn University Pre-print File Attached
14:30 15m Talk		Similar but Patched Code Considered Harmful -- The Impact of Similar but Patched Code on Recurring Vulnerability Detection and How to Remove ThemSecurity Research Track Zixuan Tan Zhejiang University, Jiayuan Zhou Huawei, Xing Hu Zhejiang University, Shengyi Pan Zhejiang University, Kui Liu Huawei, Xin Xia Huawei Pre-print
14:45 15m Talk		TIVER: Identifying Adaptive Versions of C/C++ Third-Party Open-Source Components Using a Code Clustering TechniqueSecurity Research Track Youngjae Choi Korea University, Seunghoon Woo Korea University
15:00 15m Talk		A scalable, effective and simple Vulnerability Tracking approach for heterogeneous SAST setups based on Scope+OffsetSecurity SE In Practice (SEIP) James Johnson --, Julian Thome GitLab Inc., Lucas Charles GitLab Inc., Hua Yan GitLab Inc., Jason Leasure GitLab Inc. Pre-print
15:15 15m Talk		''ImmediateShortTerm3MthsAfterThatLOL'': Developer Secure-Coding Sentiment, Practice and Culture in OrganisationsSecurity SE In Practice (SEIP) Ita Ryan University College Cork, Utz Roedig University College Cork, Klaas-Jan Stol Lero; University College Cork; SINTEF Digital

14:00 - 15:30	Design and Architecture 2Journal-first Papers / Research Track at 211 Chair(s): Yuanfang Cai Drexel University, Jan Keim Karlsruhe Institute of Technology (KIT)

14:00 15m Talk		An Exploratory Study on the Engineering of Security FeaturesSecurity Research Track Kevin Hermann Ruhr University Bochum, Sven Peldszus Ruhr University Bochum, Jan-Philipp Steghöfer XITASO GmbH IT & Software Solutions, Thorsten Berger Ruhr University Bochum Pre-print
14:15 15m Talk		DesignRepair: Dual-Stream Design Guideline-Aware Frontend Repair with Large Language Models Research Track Mingyue Yuan The university of new South Wales, Jieshan Chen CSIRO's Data61, Zhenchang Xing CSIRO's Data61, Aaron Quigley CSIRO's Data61, Yuyu Luo HKUST (GZ), Tianqi Luo HKUST (GZ), Gelareh Mohammadi The university of new South Wales, Qinghua Lu Data61, CSIRO, Liming Zhu CSIRO’s Data61
14:30 15m Talk		Fidelity of Cloud Emulators: The Imitation Game of Testing Cloud-based Software Research Track Anna Mazhar Cornell University, Saad Sher Alam University of Illinois Urbana-Champaign, William Zheng University of Illinois Urbana-Champaign, Yinfang Chen University of Illinois at Urbana-Champaign, Suman Nath Microsoft Research, Tianyin Xu University of Illinois at Urbana-Champaign
14:45 15m Talk		Formally Verified Cloud-Scale AuthorizationAward Winner Research Track Aleks Chakarov Amazon Web Services, Jaco Geldenhuys Amazon Web Services, Matthew Heck Amazon Web Services, MIchael Hicks Amazon, Samuel Huang Amazon Web Services, Georges-Axel Jaloyan Amazon Web Services, Anjali Joshi Amazon, K. Rustan M. Leino Amazon, Mikael Mayer Automated Reasoning Group, Amazon Web Services, Sean McLaughlin Amazon Web Services, Akhilesh Mritunjai Amazon.com, Clement Pit-Claudel EPFL, Sorawee Porncharoenwase Amazon Web Services, Florian Rabe Amazon Web Services, Marianna Rapoport Amazon Web Services, Giles Reger Amazon Web Services, Cody Roux Amazon Web Services, Neha Rungta Amazon Web Services, Robin Salkeld Amazon Web Services, Matthias Schlaipfer Amazon Web Services, Daniel Schoepe Amazon, Johanna Schwartzentruber Amazon Web Services, Serdar Tasiran Amazon, n.n., Aaron Tomb Amazon, Emina Torlak Amazon Web Services, USA, Jean-Baptiste Tristan Amazon, Lucas Wagner Amazon Web Services, Michael Whalen Amazon Web Services and the University of Minnesota, Remy Willems Amazon, Tongtong Xiang Amazon Web Services, Taejoon Byun University of Minnesota, Joshua M. Cohen Princeton University, Ruijie Fang University of Texas at Austin, Junyoung Jang McGill University, Jakob Rath TU Wien, Hira Taqdees Syeda , Dominik Wagner University of Oxford, Yongwei Yuan Purdue University
15:00 15m Talk		The Same Only Different: On Information Modality for Configuration Performance Analysis Research Track Hongyuan Liang University of Electronic Science and Technology of China, Yue Huang University of Electronic Science and Technology of China, Tao Chen University of Birmingham Pre-print
15:15 7m Talk		Identifying Performance Issues in Cloud Service Systems Based on Relational-Temporal Features Journal-first Papers Wenwei Gu The Chinese University of Hong Kong, Jinyang Liu Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Jianping Zhang The Chinese University of Hong Kong, Yuxin Su Sun Yat-sen University, Jiazhen Gu Chinese University of Hong Kong, Cong Feng Huawei Cloud Computing Technology, Zengyin Yang Computing and Networking Innovation Lab, Huawei Cloud Computing Technology Co., Ltd, Yongqiang Yang Huawei Cloud Computing Technology, Michael Lyu The Chinese University of Hong Kong

14:00 - 15:30	AI for Analysis 5Research Track / New Ideas and Emerging Results (NIER) at 212 Chair(s): Tien N. Nguyen University of Texas at Dallas

14:00 15m Talk		3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers Research Track Sarah Fakhoury Microsoft Research, Markus Kuppe Microsoft Research, Shuvendu K. Lahiri Microsoft Research, Tahina Ramananandro Microsoft Research, Nikhil Swamy Microsoft Research Pre-print
14:15 15m Talk		Aligning the Objective of LLM-based Program Repair Research Track Junjielong Xu The Chinese University of Hong Kong, Shenzhen, Ying Fu Chongqing University, Shin Hwei Tan Concordia University, Pinjia He Chinese University of Hong Kong, Shenzhen Pre-print
14:30 15m Talk		Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models Research Track Aidan Z.H. Yang Carnegie Mellon University, Sophia Kolak Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University, Ruben Martins Carnegie Mellon University, Claire Le Goues Carnegie Mellon University
14:45 15m Talk		The Fact Selection Problem in LLM-Based Program Repair Research Track Nikhil Parasaram Uber Amsterdam, Huijie Yan University College London, Boyu Yang University College London, Zineb Flahy University College London, Abriele Qudsi University College London, Damian Ziaber University College London, Earl T. Barr University College London, Sergey Mechtaev Peking University
15:00 15m Talk		Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models Research Track Zhijie Wang University of Alberta, Zijie Zhou University of Illinois Urbana-Champaign, Da Song University of Alberta, Yuheng Huang University of Alberta, Canada, Shengmai Chen Purdue University, Lei Ma The University of Tokyo & University of Alberta, Tianyi Zhang Purdue University Pre-print
15:15 15m Talk		Beyond Syntax: How Do LLMs Understand Code? New Ideas and Emerging Results (NIER) Marc North Durham University, Amir Atapour-Abarghouei Durham University, Nelly Bencomo Durham University

14:00 - 15:30	AI for Security 2Research Track at 213 Chair(s): Gias Uddin York University, Canada

14:00 15m Talk		Repository-Level Graph Representation Learning for Enhanced Security Patch DetectionSecurity Research Track Xin-Cheng Wen Harbin Institute of Technology, Zirui Lin Harbin Institute of Technology, Shenzhen, Cuiyun Gao Harbin Institute of Technology, Hongyu Zhang Chongqing University, Yong Wang Anhui Polytechnic University, Qing Liao Harbin Institute of Technology
14:15 15m Talk		FAMOS: Fault diagnosis for Microservice Systems through Effective Multi-modal Data FusionSecurity Research Track Chiming Duan Peking University, Yong Yang Peking University, Tong Jia Institute for Artificial Intelligence, Peking University, Beijing, China, Guiyang Liu Alibaba, Jinbu Liu Alibaba, Huxing Zhang Alibaba Group, Qi Zhou Alibaba, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Gang Huang Peking University
14:30 15m Talk		Leveraging Large Language Models to Detect npm Malicious PackagesSecurity Research Track Nusrat Zahan North Carolina State University, Philipp Burckhardt Socket, Inc, Mikola Lysenko Socket, Inc, Feross Aboukhadijeh Socket, Inc, Laurie Williams North Carolina State University
14:45 15m Talk		Magika: AI-Powered Content-Type DetectionSecurity Research Track Yanick Fratantonio Google, Luca Invernizzi Google, Loua Farah Google, Kurt Thomas Google, Marina Zhang Google, Ange Albertini Google, Francois Galilee Google, Giancarlo Metitieri Google, Julien Cretin Google, Alex Petit-Bianco Google, David Tao Google, Elie Bursztein Google
15:00 15m Talk		Closing the Gap: A User Study on the Real-world Usefulness of AI-powered Vulnerability Detection & Repair in the IDESecurity Research Track Benjamin Steenhoek Microsoft, Siva Sivaraman Microsoft, Renata Saldivar Gonzalez Microsoft, Yevhen Mohylevskyy Microsoft, Roshanak Zilouchian Moghaddam Microsoft, Wei Le Iowa State University
15:15 15m Talk		Show Me Your Code! Kill Code Poisoning: A Lightweight Method Based on Code NaturalnessSecurity Research Track Weisong Sun Nanjing University, Yuchen Chen Nanjing University, Mengzhe Yuan Nanjing University, Chunrong Fang Nanjing University, Zhenpeng Chen Nanyang Technological University, Chong Wang Nanyang Technological University, Yang Liu Nanyang Technological University, Baowen Xu State Key Laboratory for Novel Software Technology, Nanjing University, Zhenyu Chen Nanjing University Pre-print Media Attached

14:00 - 15:30	AI for Testing and QA 6Journal-first Papers / Research Track / New Ideas and Emerging Results (NIER) at 214 Chair(s): Ladan Tahvildari University of Waterloo

14:00 15m Talk		Treefix: Enabling Execution with a Tree of Prefixes Research Track Beatriz Souza Universität Stuttgart, Michael Pradel University of Stuttgart Pre-print
14:15 15m Talk		Assessing Evaluation Metrics for Neural Test Oracle Generation Journal-first Papers Jiho Shin York University, Hadi Hemmati York University, Moshi Wei York University, Song Wang York University
14:30 15m Talk		Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement Journal-first Papers Saurabhsingh Rajput Dalhousie University, Tim Widmayer University College London (UCL), Ziyuan Shang Nanyang Technological University, Maria Kechagia National and Kapodistrian University of Athens, Federica Sarro University College London, Tushar Sharma Dalhousie University
14:45 15m Talk		Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality Journal-first Papers Hao Li Queen's University, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Cor-Paul Bezemer University of Alberta Link to publication DOI Pre-print
15:00 15m Talk		Evaluating the Generalizability of LLMs in Automated Program Repair New Ideas and Emerging Results (NIER) Fengjie Li Tianjin University, Jiajun Jiang Tianjin University, Jiajun Sun Tianjin University, Hongyu Zhang Chongqing University Pre-print
15:15 15m Talk		How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study New Ideas and Emerging Results (NIER) Alejandro Velasco William & Mary, Daniel Rodriguez-Cardenas William & Mary, David Nader Palacio William & Mary, Lutfar Rahman Alif University of Dhaka, Denys Poshyvanyk William & Mary Pre-print

14:00 - 15:30	SE for AI with Quality 2Journal-first Papers / Research Track at 215 Chair(s): Romina Spalazzese Malmö University

14:00 15m Talk		Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning ProjectsSE for AI Journal-first Papers Han Wang Monash University, Sijia Yu Jilin University, Chunyang Chen TU Munich, Burak Turhan University of Oulu, Xiaodong Zhu Jilin University Link to publication DOI Pre-print
14:15 15m Talk		Boundary State Generation for Testing and Improvement of Autonomous Driving SystemsSE for AI Journal-first Papers Matteo Biagiola Università della Svizzera italiana, Paolo Tonella USI Lugano DOI Pre-print
14:30 15m Talk		D3: Differential Testing of Distributed Deep Learning with Model GenerationSE for AI Journal-first Papers Jiannan Wang Purdue University, Hung Viet Pham York University, Qi Li , Lin Tan Purdue University, Yu Guo Meta Inc., Adnan Aziz Meta Inc., Erik Meijer
14:45 15m Talk		Evaluating the Impact of Flaky Simulators on Testing Autonomous Driving SystemsSE for AI Journal-first Papers Mohammad Hossein Amini University of Ottawa, Shervin Naseri University of Ottawa, Shiva Nejati University of Ottawa
15:00 15m Talk		Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension StudySE for AI Journal-first Papers Luca Giamattei Università di Napoli Federico II, Matteo Biagiola Università della Svizzera italiana, Roberto Pietrantuono Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II, Paolo Tonella USI Lugano DOI Pre-print
15:15 15m Talk		Two is Better Than One: Digital Siblings to Improve Autonomous Driving TestingSE for AI Journal-first Papers Matteo Biagiola Università della Svizzera italiana, Andrea Stocco Technical University of Munich, fortiss, Vincenzo Riccio University of Udine, Paolo Tonella USI Lugano DOI Pre-print

16:00 - 17:30	ProcessNew Ideas and Emerging Results (NIER) / Journal-first Papers / Research Track / SE In Practice (SEIP) at 203 Chair(s): Luigi Benedicenti University of New Brunswick

16:00 15m Talk		Full Line Code Completion: Bringing AI to Desktop SE In Practice (SEIP) Anton Semenkin JetBrains, Vitaliy Bibaev JetBrains, Yaroslav Sokolov JetBrains, Kirill Krylov JetBrains, Alexey Kalina JetBrains, Anna Khannanova JetBrains, Danila Savenkov JetBrains, Darya Rovdo JetBrains, Igor Davidenko JetBrains, Kirill Karnaukhov JetBrains, Maxim Vakhrushev JetBrains, Mikhail Kostyukov JetBrains, Mikhail Podvitskii JetBrains, Petr Surkov JetBrains, Yaroslav Golubev JetBrains Research, Nikita Povarov JetBrains, Timofey Bryksin JetBrains Research Pre-print
16:15 15m Talk		Automated Accessibility Analysis of Dynamic Content Changes on Mobile Apps Research Track Forough Mehralian University of California at Irvine, Ziyao He University of California, Irvine, Sam Malek University of California at Irvine
16:30 15m Talk		Qualitative Surveys in Software Engineering Research: Definition, Critical Review, and GuidelinesResearch Methods Journal-first Papers Jorge Melegati Free University of Bozen-Bolzano, Kieran Conboy University of Galway, Daniel Graziotin University of Hohenheim Link to publication DOI
16:45 15m Talk		VulNet: Towards improving vulnerability management in the Maven ecosystemSecurity Journal-first Papers Zeyang Ma Concordia University, Shouvick Mondal IIT Gandhinagar, Tse-Hsun (Peter) Chen Concordia University, Haoxiang Zhang Centre for Software Excellence at Huawei Canada, Ahmed E. Hassan Queen’s University, Zeyang Ma Concordia University
17:00 15m Talk		Energy-Aware Software Testing New Ideas and Emerging Results (NIER) Roberto Verdecchia University of Florence, Emilio Cruciani European University of Rome, Antonia Bertolino Gran Sasso Science Institute, Breno Miranda Centro de Informática at Universidade Federal de Pernambuco Pre-print
17:15 7m Talk		SusDevOps: Promoting Sustainability to a First Principle in Software Delivery New Ideas and Emerging Results (NIER) Istvan David McMaster University / McMaster Centre for Software Certification (McSCert)

16:00 - 17:30	Testing and QA 6Journal-first Papers / Research Track / Demonstrations at 205 Chair(s): Majid Babaei McGill University

16:00 15m Talk		Characterizing Timeout Builds in Continuous Integration Journal-first Papers Nimmi Weeraddana University of Waterloo, Mahmoud Alfadel University of Calgary, Shane McIntosh University of Waterloo
16:15 15m Talk		GeMTest: A General Metamorphic Testing Framework Demonstrations Simon Speth Technical University of Munich, Alexander Pretschner TU Munich Pre-print
16:30 15m Talk		Mole: Efficient Crash Reproduction in Android Applications With Enforcing Necessary UI Events Journal-first Papers Maryam Masoudian Sharif University of Technology, Hong Kong University of Science and Technology (HKUST), Heqing Huang City University of Hong Kong, Morteza Amini Sharif University of Technology, Charles Zhang Hong Kong University of Science and Technology
16:45 15m Talk		History-Driven Fuzzing for Deep Learning Libraries Journal-first Papers Nima Shiri Harzevili York University, Mohammad Mahdi Mohajer York University, Moshi Wei York University, Hung Viet Pham York University, Song Wang York University
17:00 15m Talk		Towards a Cognitive Model of Dynamic Debugging: Does Identifier Construction Matter? Journal-first Papers Danniell Hu University of Michigan, Priscila Santiesteban University of Michigan, Madeline Endres University of Massachusetts Amherst, Westley Weimer University of Michigan
17:15 15m Talk		Janus: Detecting Rendering Bugs in Web Browsers via Visual Delta Consistency Research Track Chijin Zhou Tsinghua University, Quan Zhang Tsinghua University, Bingzhou Qian National University of Defense Technology, Yu Jiang Tsinghua University

16:00 - 17:30	Human and Social for AIResearch Track / SE in Society (SEIS) / SE In Practice (SEIP) at 206 plus 208 Chair(s): Ramiro Liscano Ontario Tech University

16:00 15m Talk		ChatGPT Inaccuracy Mitigation during Technical Report Understanding: Are We There Yet? Research Track Salma Begum Tamanna University of Calgary, Canada, Gias Uddin York University, Canada, Song Wang York University, Lan Xia IBM, Canada, Longyu Zhang IBM, Canada
16:15 15m Talk		Navigating the Testing of Evolving Deep Learning Systems: An Exploratory Interview Study Research Track Hanmo You Tianjin University, Zan Wang Tianjin University, Bin Lin Hangzhou Dianzi University, Junjie Chen Tianjin University
16:30 15m Talk		An Empirical Study on Decision-Making Aspects in Responsible Software Engineering for AI SE In Practice (SEIP) Lekshmi Murali Rani Chalmers University of Technology and University of Gothenburg, Sweden, Faezeh Mohammadi Chalmers University of Technology and University of Gothenburg, Sweden, Robert Feldt Chalmers \| University of Gothenburg, Richard Berntsson Svensson Chalmers \| University of Gothenburg Pre-print
16:45 15m Talk		Curious, Critical Thinker, Empathetic, and Ethically Responsible: Essential Soft Skills for Data Scientists in Software Engineering SE in Society (SEIS) Matheus de Morais Leça University of Calgary, Ronnie de Souza Santos University of Calgary
17:00 15m Talk		Multi-Modal LLM-based Fully-Automated Training Dataset Generation Software Platform for Mathematics Education SE in Society (SEIS) Minjoo Kim Sookmyung Women's University, Tae-Hyun Kim Sookmyung Women's University, Jaehyun Chung Korea University, Hyunseok Choi Korea University, Seokhyeon Min Korea University, Joon-Ho Lim Tutorus Labs, Soohyun Park Sookmyung Women's University
17:15 15m Talk		What Does a Software Engineer Look Like? Exploring Societal Stereotypes in LLMs SE in Society (SEIS) Muneera Bano CSIRO's Data61, Hashini Gunatilake Monash University, Rashina Hoda Monash University

16:00 - 17:30	Mobile SoftwareResearch Track at 207 Chair(s): Mattia Fazzini University of Minnesota

16:00 15m Talk		EP-Detector: Automatic Detection of Error-prone Operation Anomalies in Android ApplicationsSecurity Research Track Chenkai Guo Nankai University, China, Qianlu Wang College of Cyber Science, Nankai University, Naipeng Dong The University of Queensland, Australia, Lingling Fan Nankai University, Tianhong Wang College of Computer Science, Nankai University, Weijie Zhang College of Computer Science, Nankai University, EnBao Chen College of Cyber Science, Nankai University, Zheli Liu Nankai University, Lu Yu National University of Defense Technology; Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation
16:15 15m Talk		Mobile Application Coverage: The 30% Curse and Ways Forward Research Track Faridah Akinotcho University of British Columbia, Canada, Lili Wei McGill University, Julia Rubin The University of British Columbia Pre-print
16:30 15m Talk		The Design Smells Breaking the Boundary between Android Variants and AOSP Research Track Wuxia Jin Xi'an Jiaotong University, Jiaowei Shang Xi'an Jiaotong University, Jianguo Zheng Xi'an Jiaotong University, Mengjie Sun Xi’an Jiaotong University, Zhenyu Huang Honor Device Co., Ltd., Ming Fan Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University
16:45 15m Talk		Scenario-Driven and Context-Aware Automated Accessibility Testing for Android Apps Research Track Yuxin Zhang Tianjin University, Sen Chen Nankai University, Xiaofei Xie Singapore Management University, Zibo Liu College of Intelligence and Computing, Tianjin University, Lingling Fan Nankai University
17:00 15m Talk		TacDroid: Detection of Illicit Apps through Hybrid Analysis of UI-based Transition Graphs Research Track Yanchen Lu Zhejiang University, Hongyu Lin Zhejiang University, Zehua He Zhejiang University, Haitao Xu Zhejiang University, Zhao Li Hangzhou Yugu Technology, Shuai Hao Old Dominion University, Liu Wang Beijing University of Posts and Telecommunications, Haoyu Wang Huazhong University of Science and Technology, Kui Ren Zhejiang University
17:15 15m Talk		PacDroid: A Pointer-Analysis-Centric Framework for Security Vulnerabilities in Android AppsSecurityAward Winner Best Artifact Research Track Menglong Chen Nanjing University, Tian Tan Nanjing University, Minxue Pan Nanjing University, Yue Li Nanjing University

16:00 - 17:30	Security and QAResearch Track / Journal-first Papers / SE In Practice (SEIP) at 210 Chair(s): Nafiseh Kahani Carleton University

16:00 15m Talk		ROSA: Finding Backdoors with FuzzingSecurityAward Winner Best Artifact Research Track Dimitri Kokkonis Université Paris-Saclay, CEA, List, Michaël Marcozzi Université Paris-Saclay, CEA, List, Emilien Decoux Université Paris-Saclay, CEA List, Stefano Zacchiroli LTCI, Télécom Paris, Institut Polytechnique de Paris, Palaiseau, France Link to publication DOI Pre-print Media Attached File Attached
16:15 15m Talk		Analyzing the Feasibility of Adopting Google's Nonce-Based CSP Solutions on WebsitesSecurity Research Track Mengxia Ren Colorado School of Mines, Anhao Xiang Colorado School of Mines, Chuan Yue Colorado School of Mines
16:30 15m Talk		Early Detection of Performance Regressions by Bridging Local Performance Data and Architectural ModelsSecurityAward Winner Research Track Lizhi Liao Memorial University of Newfoundland, Simon Eismann University of Würzburg, Heng Li Polytechnique Montréal, Cor-Paul Bezemer University of Alberta, Diego Elias Costa Concordia University, Canada, André van Hoorn University of Hamburg, Germany, Weiyi Shang University of Waterloo
16:45 15m Talk		Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic DatasetsSecurity Journal-first Papers Partha Chakraborty University of Waterloo, Krishna Kanth Arumugam University of Waterloo, Mahmoud Alfadel University of Calgary, Mei Nagappan University of Waterloo, Shane McIntosh University of Waterloo
17:00 15m Talk		Sunflower: Enhancing Linux Kernel Fuzzing via Exploit-Driven Seed GenerationSecurity SE In Practice (SEIP) Qiang Zhang Hunan University, Yuheng Shen Tsinghua University, Jianzhong Liu Tsinghua University, Yiru Xu Tsinghua University, Heyuan Shi Central South University, Yu Jiang Tsinghua University, Wanli Chang College of Computer Science and Electronic Engineering, Hunan University
17:15 15m Talk		Practical Object-Level Sanitizer With Aggregated Memory Access and Custom AllocatorSecurity Research Track Xiaolei wang National University of Defense Technology, Ruilin Li National University of Defense Technology, Bin Zhang National University of Defense Technology, Chao Feng National University of Defense Technology, Chaojing Tang National University of Defense Technology

16:00 - 17:30	AI for ProcessSE In Practice (SEIP) / Demonstrations / New Ideas and Emerging Results (NIER) / Research Track at 212 Chair(s): Keheliya Gallaba Centre for Software Excellence, Huawei Canada

16:00 15m Talk		OptCD: Optimizing Continuous Development Demonstrations Talank Baral George Mason University, Emirhan Oğul Middle East Technical University, Shanto Rahman The University of Texas at Austin, August Shi The University of Texas at Austin, Wing Lam George Mason University
16:15 15m Talk		LLMs as Evaluators: A Novel Approach to Commit Message Quality Assessment New Ideas and Emerging Results (NIER) Abhishek Kumar Indian Institute of Technology, Kharagpur, Sandhya Sankar Indian Institute of Technology, Kharagpur, Sonia Haiduc Florida State University, Partha Pratim Das Indian Institute of Technology, Kharagpur, Partha Pratim Chakrabarti Indian Institute of Technology, Kharagpur
16:30 15m Talk		Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings SE In Practice (SEIP) Petr Tsvetkov JetBrains Research, Aleksandra Eliseeva JetBrains Research, Danny Dig University of Colorado Boulder, JetBrains Research, Alexander Bezzubov JetBrains, Yaroslav Golubev JetBrains Research, Timofey Bryksin JetBrains Research, Yaroslav Zharov JetBrains Research Pre-print
16:45 15m Talk		Enhancing Differential Testing: LLM-Powered Automation in Release Engineering SE In Practice (SEIP) Ajay Krishna Vajjala George Mason University, Arun Krishna Vajjala George Mason University, Carmen Badea Microsoft Research, Christian Bird Microsoft Research, Robert DeLine Microsoft Research, Jason Entenmann Microsoft Research, Nicole Forsgren Microsoft Research, Aliaksandr Hramadski Microsoft, Sandeepan Sanyal Microsoft, Oleg Surmachev Microsoft, Thomas Zimmermann University of California, Irvine, Haris Mohammad Microsoft, Jade D'Souza Microsoft, Mikhail Demyanyuk Microsoft
17:00 15m Talk		How much does AI impact development speed? An enterprise-based randomized controlled trial SE In Practice (SEIP) Elise Paradis Google, Inc, Kate Grey Google, Quinn Madison Google, Daye Nam Google, Andrew Macvean Google, Inc., Nan Zhang Google, Ben Ferrari-Church Google, Satish Chandra Google, Inc
17:15 15m Talk		Using Reinforcement Learning to Sustain the Performance of Version Control Repositories New Ideas and Emerging Results (NIER) Shane McIntosh University of Waterloo, Luca Milanesio GerritForge Inc., Antonio Barone GerritForge Inc., Jacek Centkowski GerritForge Inc., Marcin Czech GerritForge Inc., Fabio Ponciroli GerritForge Inc. Pre-print

16:00 - 17:30	AI for Security 3Research Track / New Ideas and Emerging Results (NIER) at 213 Chair(s): Tien N. Nguyen University of Texas at Dallas

16:00 15m Talk		GVI: Guided Vulnerability Imagination for Boosting Deep Vulnerability DetectorsSecurity Research Track Heng Yong Nanjing University, Zhong Li , Minxue Pan Nanjing University, Tian Zhang Nanjing University, Jianhua Zhao Nanjing University, China, Xuandong Li Nanjing University
16:15 15m Talk		Decoding Secret Memorization in Code LLMs Through Token-Level CharacterizationSecurity Research Track Yuqing Nie Beijing University of Posts and Telecommunications, Chong Wang Nanyang Technological University, Kailong Wang Huazhong University of Science and Technology, Guoai Xu Harbin Institute of Technology, Shenzhen, Guosheng Xu Key Laboratory of Trustworthy Distributed Computing and Service (MoE), Beijing University of Posts and Telecommunications, Haoyu Wang Huazhong University of Science and Technology
16:30 15m Talk		Are We Learning the Right Features? A Framework for Evaluating DL-Based Software Vulnerability Detection SolutionsSecurity Research Track Satyaki Das University of Southern California, Syeda Tasnim Fabiha University of Southern California, Saad Shafiq University of Southern California, Nenad Medvidović University of Southern California Pre-print Media Attached File Attached
16:45 15m Talk		Boosting Static Resource Leak Detection via LLM-based Resource-Oriented Intention InferenceSecurity Research Track Chong Wang Nanyang Technological University, Jianan Liu Fudan University, Xin Peng Fudan University, Yang Liu Nanyang Technological University, Yiling Lou Fudan University
17:00 15m Talk		Weakly-supervised Log-based Anomaly Detection with Inexact Labels via Multi-instance LearningSecurity Research Track Minghua He Peking University, Tong Jia Institute for Artificial Intelligence, Peking University, Beijing, China, Chiming Duan Peking University, Huaqian Cai Peking University, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Gang Huang Peking University
17:15 7m Talk		Towards Early Warning and Migration of High-Risk Dormant Open-Source Software DependenciesSecurity New Ideas and Emerging Results (NIER) Zijie Huang Shanghai Key Laboratory of Computer Software Testing and Evaluation, Lizhi Cai Shanghai Key Laboratory of Computer Software Testing & Evaluating, Shanghai Software Center, Xuan Mao Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China, Kang Yang Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai Development Center of Computer Software Technology

16:00 - 17:30	Quantum SEJournal-first Papers / Research Track at 214 Chair(s): Dennis Mancl MSWX Software Experts

16:00 15m Talk		QuanTest: Entanglement-Guided Testing of Quantum Neural Network SystemsQuantum Journal-first Papers Jinjing Shi Central South University, Zimeng Xiao Central South University, Heyuan Shi Central South University, Yu Jiang Tsinghua University, Xuelong LI China Telecom Link to publication
16:15 15m Talk		Quantum Approximate Optimization Algorithm for Test Case OptimizationQuantum Journal-first Papers Xinyi Wang Simula Research Laboratory; University of Oslo, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Tao Yue Beihang University, Paolo Arcaini National Institute of Informatics
16:30 15m Talk		Testing Multi-Subroutine Quantum Programs: From Unit Testing to Integration TestingQuantum Journal-first Papers Peixun Long Institute of High Energy Physics, Chinese Academy of Science, Jianjun Zhao Kyushu University
16:45 15m Talk		Mitigating Noise in Quantum Software Testing Using Machine LearningQuantum Journal-first Papers Asmar Muqeet Simula Research Laboratory and University of Oslo, Tao Yue Beihang University, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Paolo Arcaini National Institute of Informatics , Asmar Muqeet Simula Research Laboratory and University of Oslo
17:00 15m Talk		Test Case Minimization with Quantum AnnealingQuantum Journal-first Papers Xinyi Wang Simula Research Laboratory; University of Oslo, Asmar Muqeet Simula Research Laboratory and University of Oslo, Tao Yue Beihang University, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Paolo Arcaini National Institute of Informatics
17:15 7m Talk		When Quantum Meets Classical: Characterizing Hybrid Quantum-Classical Issues Discussed in Developer ForumsQuantum Research Track Jake Zappin William and Mary, Trevor Stalnaker William & Mary, Oscar Chaparro William & Mary, Denys Poshyvanyk William & Mary

16:00 - 17:30	SE for AI with Quality 3Research Track / SE In Practice (SEIP) at 215 Chair(s): Sumon Biswas Case Western Reserve University

16:00 15m Talk		Improved Detection and Diagnosis of Faults in Deep Neural Networks Using Hierarchical and Explainable ClassificationSE for AI Research Track Sigma Jahan Dalhousie University, Mehil Shah Dalhousie University, Parvez Mahbub Dalhousie University, Masud Rahman Dalhousie University Pre-print
16:15 15m Talk		Lightweight Concolic Testing via Path-Condition Synthesis for Deep Learning LibrariesSE for AI Research Track Sehoon Kim , Yonghyeon Kim UNIST, Dahyeon Park UNIST, Yuseok Jeon UNIST, Jooyong Yi UNIST, Mijung Kim UNIST
16:30 15m Talk		Mock Deep Testing: Toward Separate Development of Data and Models for Deep LearningSE for AI Research Track Ruchira Manke Tulane University, USA, Mohammad Wardat Oakland University, USA, Foutse Khomh Polytechnique Montréal, Hridesh Rajan Tulane University
16:45 15m Talk		RUG: Turbo LLM for Rust Unit Test GenerationSE for AI Research Track Xiang Cheng Georgia Institute of Technology, Fan Sang Georgia Institute of Technology, Yizhuo Zhai Georgia Institute of Technology, Xiaokuan Zhang George Mason University, Taesoo Kim Georgia Institute of Technology Pre-print Media Attached File Attached
17:00 15m Talk		Test Input Validation for Vision-based DL Systems: An Active Learning ApproachSE for AI SE In Practice (SEIP) Delaram Ghobari University of Ottawa, Mohammad Hossein Amini University of Ottawa, Dai Quoc Tran SmartInsideAI Company Ltd. and Sungkyunkwan University, Seunghee Park SmartInsideAI Company Ltd. and Sungkyunkwan University, Shiva Nejati University of Ottawa, Mehrdad Sabetzadeh University of Ottawa Pre-print
17:15 15m Talk		SEMANTIC CODE FINDER: An Efficient Semantic Search Framework for Large-Scale Codebases SE In Practice (SEIP) daeha ryu Innovation Center, Samsung Electronics, Seokjun Ko Samsung Electronics Co., Eunbi Jang Innovation Center, Samsung Electronics, jinyoung park Innovation Center, Samsung Electronics, myunggwan kim Innovation Center, Samsung Electronics, changseo park Innovation Center, Samsung Electronics

16:00 - 17:30	BlockchainResearch Track at Canada Hall 1 and 2 Chair(s): Daniel Amyot University of Ottawa

16:00 15m Talk		An Empirical Study of Proxy Smart Contracts at the Ethereum Ecosystem ScaleBlockchain Research Track Mengya Zhang The Ohio State University, Preksha Shukla George Mason University, Wuqi Zhang Mega Labs, Zhuo Zhang Purdue University, Pranav Agrawal George Mason University, Zhiqiang Lin The Ohio State University, Xiangyu Zhang Purdue University, Xiaokuan Zhang George Mason University
16:15 15m Talk		Demystifying and Detecting Cryptographic Defects in Ethereum Smart ContractsBlockchainAward Winner Research Track Jiashuo Zhang Peking University, China, Yiming Shen Sun Yat-sen University, Jiachi Chen Sun Yat-sen University, Jianzhong Su Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Ting Chen University of Electronic Science and Technology of China, Jianbo Gao Peking University, Zhong Chen
16:30 15m Talk		Chord: Towards a Unified Detection of Blockchain Transaction Parallelism BugsBlockchain Research Track Yuanhang Zhou Tsinghua University, Zhen Yan Tsinghua University, Yuanliang Chen Tsinghua University, Fuchen Ma Tsinghua University, Ting Chen University of Electronic Science and Technology of China, Yu Jiang Tsinghua University
16:45 15m Talk		Definition and Detection of Centralization Defects in Smart ContractsBlockchain Research Track Zewei Lin Sun Yat-sen University, Jiachi Chen Sun Yat-sen University, Jiajing Wu Sun Yat-sen University, Weizhe Zhang Harbin Institute of Technology, Zibin Zheng Sun Yat-sen University
17:00 15m Talk		Fork State-Aware Differential Fuzzing for Blockchain Consensus ImplementationsBlockchain Research Track Won Hoi Kim KAIST, Hocheol Nam KAIST, Muoi Tran ETH Zurich, Amin Jalilov KAIST, Zhenkai Liang National University of Singapore, Sang Kil Cha KAIST, Min Suk Kang KAIST DOI Pre-print
17:15 15m Talk		Code Cloning in Solidity Smart Contracts: Prevalence, Evolution, and Impact on DevelopmentBlockchain Research Track Ran Mo Central China Normal University, Haopeng Song Central China Normal University, Wei Ding Central China Normal University, Chaochao Wu Central China Normal University

Accepted Papers

	Title
	Agent for User: Testing Multi-User Interactive Features in TikTok SE In Practice (SEIP) Sidong Feng, Changhao Du, huaxiao liu, Qingnan Wang, Zhengwei Lv, Gang Huo, Xu Yang, Chunyang Chen
	AI-Assisted SQL Authoring at Industry ScaleSE for AI SE In Practice (SEIP) Chandra Sekhar Maddila, Negar Ghorbani, Kosay Jabre, Vijayaraghavan Murali, Edwin Kim, Parth Thakkar, Nikolay Pavlovich Laptev, Olivia Harman, Diana Hsu, Rui Abreu, Peter C Rigby
	aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing SE In Practice (SEIP) Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Ge Li Pre-print
	An Empirical Study on Decision-Making Aspects in Responsible Software Engineering for AI SE In Practice (SEIP) Lekshmi Murali Rani, Faezeh Mohammadi, Robert Feldt, Richard Berntsson Svensson Pre-print
	Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset SE In Practice (SEIP) Mohammad Saiful Islam, Mohamed Sami Rakha, William Pourmajidi, Janakan Sivaloganathan, John Steinbacher, Andriy Miranskyy Pre-print
	ArkAnalyzer: The Static Analysis Framework for OpenHarmony SE In Practice (SEIP) chenhaonan , Daihang Chen, Yizhuo Yang, Lingyun Xu, Liang Gao, Mingyi Zhou, Chunming Hu, Li Li
	A scalable, effective and simple Vulnerability Tracking approach for heterogeneous SAST setups based on Scope+OffsetSecurity SE In Practice (SEIP) James Johnson, Julian Thome, Lucas Charles, Hua Yan, Jason Leasure Pre-print
	ASTER: Natural and Multi-language Unit Test Generation with LLMsAward Winner SE In Practice (SEIP) Rangeet Pan, Myeongsoo Kim, Rahul Krishna, Raju Pavuluri, Saurabh Sinha Pre-print
	Automated Code Review In Practice SE In Practice (SEIP) Umut Cihan, Vahid Haratian, Arda İçöz, Mert Kaan Gül, Ömercan Devran, Emircan Furkan Bayendur, Baykal Mehmet Ucar, Eray Tüzün Pre-print
	Automating Explanation Need Management in App Reviews: A Case Study from the Navigation App Industry SE In Practice (SEIP) Martin Obaidi, Nicolas Voß, Hannah Deters, Jakob Droste, Marc Herrmann, Jannik Fischbach, Kurt Schneider
	Automating ML Model Development at ScaleSE for AI SE In Practice (SEIP) Kaiyuan Wang, Yang Li, Junyang Shen, Kaikai Sheng, Yiwei You, Jiaqi Zhang, Srikar Ayyalasomayajula, Julian Grady, Martin Wicke
	Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products SE In Practice (SEIP) Nadia Nahar, Christian Kästner, Jenna L. Butler, Chris Parnin, Thomas Zimmermann, Christian Bird
	BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems SE In Practice (SEIP) Tao Duan, Runqing Chen, Pinghui Wang, Junzhou Zhao, Jiongzhou Liu, Shujie Han, Yi Liu, Fan Xu
	CI at Scale: Lean, Green, and Fast SE In Practice (SEIP) Dhruva Juloori, Zhongpeng Lin, Matthew Williams, Eddy Shin, Sonal Mahajan
	Crash Report Prioritization for Large-Scale Scheduled Launches SE In Practice (SEIP) Nimmi Rashinika Weeraddana, Sarra Habchi, Shane McIntosh
	Dear Diary: A randomized controlled trial of Generative AI coding tools in the workplace SE In Practice (SEIP) Jenna L. Butler, Jina Suh, Sankeerti Haniyur, Constance Hadley
	Detecting Python Malware in the Software Supply Chain with Program AnalysisSecurity SE In Practice (SEIP) Ridwan Salihin Shariffdeen, Behnaz Hassanshahi, Martin Mirchev, Ali El Husseini, Abhik Roychoudhury
	DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production SE In Practice (SEIP) Xiaoyun Liang, Jingyi Ren, Jiayi Qi, Chao Peng, Bo Jiang
	Enhancing Differential Testing: LLM-Powered Automation in Release Engineering SE In Practice (SEIP) Ajay Krishna Vajjala, Arun Krishna Vajjala, Carmen Badea, Christian Bird, Robert DeLine, Jason Entenmann, Nicole Forsgren, Aliaksandr Hramadski, Sandeepan Sanyal, Oleg Surmachev, Thomas Zimmermann, Haris Mohammad, Jade D'Souza, Mikhail Demyanyuk
	Evaluating Agent-based Program Repair at Google SE In Practice (SEIP) Patrick Rondon, Renyao Wei, José Pablo Cambronero, Jürgen Cito, Aaron Sun, Siddhant Sanyam, Michele Tufano, Satish Chandra
	Evaluation of Tools and Frameworks for Machine Learning Model ServingSE for AI SE In Practice (SEIP) Niklas Beck, Benny Stein, Dennis Wegener, Lennard Helmer
	Exploring GenAI in Software Development: Insights from a Case Study in a Large Brazilian Company SE In Practice (SEIP) Guilherme Vaz Pereira, Victoria Jackson, Rafael Prikladnicki, Andre van der Hoek, Luciane Fortes, Carolina Araújo, André Coelho, Ligia Chelli, Diego Ramos Pre-print
	FlatD: Protecting Deep Neural Network Program from Reversing Attacks SE In Practice (SEIP) Jinquan Zhang, Zihao Wang, Pei Wang, Rui Zhong, Dinghao Wu
	Full Line Code Completion: Bringing AI to Desktop SE In Practice (SEIP) Anton Semenkin, Vitaliy Bibaev, Yaroslav Sokolov, Kirill Krylov, Alexey Kalina, Anna Khannanova, Danila Savenkov, Darya Rovdo, Igor Davidenko, Kirill Karnaukhov, Maxim Vakhrushev, Mikhail Kostyukov, Mikhail Podvitskii, Petr Surkov, Yaroslav Golubev, Nikita Povarov, Timofey Bryksin Pre-print
	GUIWatcher: Automatically Detecting GUI Lags by Analyzing Mobile Application Screencasts SE In Practice (SEIP) Wei Liu, Feng Lin, Linqiang Guo, Tse-Hsun (Peter) Chen, Ahmed E. Hassan
	How is Google using AI for internal code migrations? SE In Practice (SEIP) Stoyan Nikolov, Daniele Codecasa, Anna Sjovall, Maxim Tabachnyk, Siddharth Taneja, Celal Ziftci, Satish Chandra
	How much does AI impact development speed? An enterprise-based randomized controlled trial SE In Practice (SEIP) Elise Paradis, Kate Grey, Quinn Madison, Daye Nam, Andrew Macvean, Nan Zhang, Ben Ferrari-Church, Satish Chandra
	Human-In-the-Loop Software Development Agents SE In Practice (SEIP) Wannita Takerngsaksiri, Jirat Pasuksmit, Patanamon Thongtanunam, Kla Tantithamthavorn, Ruixiong Zhang, Fan Jiang, Jing Li, Evan Cook, Kun Chen, Ming Wu
	Identifying Factors Contributing to ``Bad Days'' for Software Developers: A Mixed-Methods Study SE In Practice (SEIP) Ike Obi, Jenna L. Butler, Sankeerti Haniyur, Brian Hassan, Margaret-Anne Storey, Brendan Murphy
	''ImmediateShortTerm3MthsAfterThatLOL'': Developer Secure-Coding Sentiment, Practice and Culture in OrganisationsSecurity SE In Practice (SEIP) Ita Ryan, Utz Roedig, Klaas-Jan Stol
	Improving Code Performance Using LLMs in Zero-Shot: RAPGen SE In Practice (SEIP) Spandan Garg, Roshanak Zilouchian Moghaddam, Neel Sundaresan
	KuiTest: Leveraging Knowledge in the Wild as GUI Testing Oracle for Mobile Apps SE In Practice (SEIP) Yongxiang Hu, Yu Zhang, Xuan Wang, Yingjie Liu, Shiyu Guo, Chaoyi Chen, Xin Wang, Yangfan Zhou
	Leveraging MLOps: Developing a Sequential Classification System for RFQ Documents in Electrical Engineering SE In Practice (SEIP) Claudio Martens, Hammam Abdelwahab, Katharina Beckh, Birgit Kirsch, Vishwani Gupta, Dennis Wegener, Steffen Hoh
	LLM Driven Smart Assistant for Data Mapping SE In Practice (SEIP) Arihant Bedagkar, Sayandeep Mitra, Raveendra Kumar Medicherla, Ravindra Naik, Samiran Pal
	LogLM: From Task-based to Instruction-based Automated Log Analysis SE In Practice (SEIP) Yilun Liu, Yuhe Ji, Shimin Tao, Minggui He, Weibin Meng, Shenglin Zhang, Yongqian Sun, Yuming Xie, Boxing Chen, Hao Yang Pre-print
	Moving Faster and Reducing Risk: Using LLMs in Release DeploymentAward Winner SE In Practice (SEIP) Rui Abreu, Vijayaraghavan Murali, Peter C Rigby, Chandra Sekhar Maddila, Weiyan Sun, Jun Ge, Kaavya Chinniah, Audris Mockus, Megh Mehta, Nachiappan Nagappan
	NICE: Non-Functional Requirements Identification, Classification, and Explanation Using Small Language ModelsAward Winner SE In Practice (SEIP) Gokul Rejithkumar, Preethu Rose Anish Pre-print
	On Large Language Models in Mission-Critical IT Governance: Are We Ready Yet? SE In Practice (SEIP) Matteo Esposito, Francesco Palagiano, Valentina Lenarduzzi, Davide Taibi DOI Pre-print
	On Mitigating Code LLM Hallucinations with API Documentation SE In Practice (SEIP) Nihal Jain, Robert Kwiatkowski, Baishakhi Ray, Murali Krishna Ramanathan, Varun Kumar
	On the Diagnosis of Flaky Job Failures: Understanding and Prioritizing Failure Categories SE In Practice (SEIP) Henri Aïdasso, Francis Bordeleau, Ali Tizghadam Pre-print
	Prioritizing Large-scale Natural Language Test Cases at OPPO SE In Practice (SEIP) Haoran Xu, Chen Zhi, Tianyu Xiang, Zixuan Wu, Gaorong Zhang, Xinkui Zhao, Jianwei Yin, Shuiguang Deng
	Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation ModelsSE for AI SE In Practice (SEIP) Kirill Vasilevski, Dayi Lin, Ahmed E. Hassan Pre-print File Attached
	Safe Validation of Pricing Agreements SE In Practice (SEIP) John C. Kolesar, Tancrède Lepoint, Martin Schäf, Willem Visser
	Search+LLM-based Testing for ARM Simulators SE In Practice (SEIP) Bobby Bruce, Aidan Dakhama, Karine Even-Mendoza, William B. Langdon, Hector Menendez, Justyna Petke
	SEMANTIC CODE FINDER: An Efficient Semantic Search Framework for Large-Scale Codebases SE In Practice (SEIP) daeha ryu, Seokjun Ko, Eunbi Jang, jinyoung park, myunggwan kim, changseo park
	Soft Skills in Software Engineering: Insights from the Trenches SE In Practice (SEIP) Sanna Malinen, Matthias Galster, Antonija Mitrovic, Sreedevi Sankara Iyer, Pasan Peiris, April Clarke
	Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models SE In Practice (SEIP) Hao Li, Cor-Paul Bezemer, Ahmed E. Hassan Pre-print
	Sunflower: Enhancing Linux Kernel Fuzzing via Exploit-Driven Seed GenerationSecurity SE In Practice (SEIP) Qiang Zhang, Yuheng Shen, Jianzhong Liu, Yiru Xu, Heyuan Shi, Yu Jiang, Wanli Chang
	Systematizing Inclusive Design in MOSIP: An Experience Report SE In Practice (SEIP) Soumiki Chattopadhyay, Amreeta Chatterjee, Puja Agarwal, Bianca Trinkenreich, Swarathmika Kumar, Rohit Ranjan Rai, Resham Chugani, Pragya Kumari, Margaret Burnett, Anita Sarma Pre-print
	Testing False Recalls in E-commerce Apps: a User-perspective Blackbox Approach SE In Practice (SEIP) Shengnan Wu, Yongxiang Hu, Jiazhen Gu, Penglei Mao, Jin Meng, Liujie Fan, Zhongshi Luan, Xin Wang, Yangfan Zhou
	Test Input Validation for Vision-based DL Systems: An Active Learning ApproachSE for AI SE In Practice (SEIP) Delaram Ghobari, Mohammad Hossein Amini, Dai Quoc Tran, Seunghee Park, Shiva Nejati, Mehrdad Sabetzadeh Pre-print
	Time to Retrain? Detecting Concept Drifts in Machine Learning Systems SE In Practice (SEIP) Tri Minh-Triet Pham, Karthikeyan Premkumar, Mohamed Naili, Jinqiu Yang
	Time Warp: The Gap Between Developers’ Ideal vs Actual Workweeks in an AI-Driven EraAward Winner SE In Practice (SEIP) Sukrit Kumar, Drishti Goel, Thomas Zimmermann, Brian Houck, B. Ashok, Chetan Bansal
	Towards Better Static Analysis Bug Reports in the Clang Static Analyzer SE In Practice (SEIP) Kristóf Umann, Zoltán Porkoláb
	Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings SE In Practice (SEIP) Petr Tsvetkov, Aleksandra Eliseeva, Danny Dig, Alexander Bezzubov, Yaroslav Golubev, Timofey Bryksin, Yaroslav Zharov Pre-print
	UML Sequence Diagram Generation: A Multi-Model, Multi-Domain Evaluation SE In Practice (SEIP) Chi Xiao, Daniel Ståhl, Jan Bosch
	Video Game Procedural Content Generation Through Software Transplantation SE In Practice (SEIP) Mar Zamorano López, Daniel Blasco, Carlos Cetina, Federica Sarro
	Wearables to measure developer experience at work SE In Practice (SEIP) Charlotte Brandebusemeyer, Tobias Schimmer, Bert Arnrich

The following papers have been accepted in the ICSE 2025 SEIP Track. The papers are will be published by the IEEE and appear in the IEEE and ACM digital libraries, subject to an author submitting their camera-ready and copyright forms, and registering to attend the conference. (Authors are required to present the papers at the conference, otherwise they will be withdrawn).

John Kolesar, Tancrède Lepoint, Martin Schäf, Willem Visser, "Safe Validation of Pricing Agreements"

Abstract: Pricing agreements at AWS define how customers are billed for usage of services and resources. A pricing agreement consists of a complex sequence of terms that can include free tiers, savings plans, credits, volume discounts, and other similar features. To ensure that pricing agreements reflect the customers' intentions, we employ a protocol that runs a set of validations that check all pricing agreements for irregularities. Since pricing agreements are sensitive, we want to limit the amount of information available to actors involved in the validation protocol. Actors who repair validation errors should not be able to obtain information that is not necessary for the repair. Personal information for individual customers should remain private, and we also want to hide statistical properties of the set of pricing agreements, such as the average number of credits awarded to customers. We introduce Parrot, a protocol for validation of pricing agreements that prevents information leakage. Parrot involves three categories of actors: the auditor who writes a query to check a validity condition, the fixers who repair anonymized invalid pricing agreements, and the owners who have permission to view non-anonymized pricing agreements. The protocol includes safeguards to prohibit parties other than the owners from learning sensitive information about individual customers or statistical properties of the full set of pricing agreements.

Tags: "Business", "Analysis", "Security"

Qiang Zhang, Yuheng Shen, Jianzhong Liu, Yiru Xu, Heyuan Shi, Yu Jiang, Wanli Chang, "Sunflower: Enhancing Linux Kernel Fuzzing via Exploit-Driven Seed Generation"

Abstract: Abstract—The Linux kernel is the foundation of billions of contemporary computing systems, and ensuring its security and integrity is a necessity. Despite the Linux kernel’s pivotal role, guaranteeing its security is a difficult task due to its complex code logic. This leads to new vulnerabilities being frequently introduced, and malicious exploits can result in severe conse- quences like Denial of Service (DoS) or Remote Code Execu- tion (RCE). Fuzz testing (fuzzing), particularly Syzkaller, has been instrumental in detecting vulnerabilities within the kernel. However, Syzkaller’s effectiveness is hindered due to limitations in system call descriptions and initial seeds. In this paper, we propose SUNFLOWER, an initial corpus generator that leverages existing exploits and proof-of-concept examples. SUNFLOWER is specifically designed to meet the critical requirements of industry deployments by facilitating the construction of a high-quality seed corpus based on bugs found in the wild. By collecting and analyzing numerous real-world exploits responsible for kernel vulnerabilities, the tool extracts essential system call sequences while also rectifying execution dependency errors. This approach addresses a pressing industry need for more effective vulnerabil- ity assessment and exploit development, making it an invaluable asset for cybersecurity professionals. The evaluation shows that, with the help of SUNFLOWER, we find a total number of 25 previously unknown vulnerabilities within the extensively tested Linux Kernel, while by augmenting Syzkaller with SUNFLOWER, we achieve a 9.5% and 10.8% improvement on code coverage compared with the Syzkaller and Moonshine.

Tags: "Security", "Testing and Quality"

Ridwan Shariffdeen, Behnaz Hassanshahi, Martin Mirchev, Ali El Husseini, Abhik Roychoudhury, "Detecting Python Malware in the Software Supply Chain with Program Analysis"

Abstract: The frequency of supply-chain attacks has reached unprecedented levels, amounting to a growing concern about the security of open-source software. Existing state-of-the-art techniques often generate a high number of false positives and false negatives. For an effective detection tool, it is crucial to strike a balance between these results. In this paper, we address the problem of software supply chain protection through program analysis. We present HERCULE, an inter-package analysis tool to detect malicious packages in the Python ecosystem. We enhance state-of-the-art approaches with the primary goal of reducing false positives. Key technical contributions include improving the accuracy of pattern-based malware detection and employing program dependency analysis to identify malicious packages in the development environment. Extensive evaluation against multiple benchmarks including Backstabber’s Knife Collection and MalOSS demonstrates that HERCULE outperforms existing state-of-the-art techniques with 0.866 f1-score. Additionally, HERCULE detected new malicious packages which the PyPI security team removed, showing its practical value.

Tags: "Security", "Analysis"

Dhruva Juloori, Zhongpeng Lin, Matthew Williams, Eddy Shin, Sonal Mahajan, "CI at Scale: Lean, Green, and Fast"

Abstract: Maintaining a "green" mainline branch—where all builds pass successfully—is crucial but challenging in fast-paced, large-scale software development environments, particularly with concurrent code changes in large monorepos. SubmitQueue, a system designed to address these challenges, speculatively executes builds and only lands changes with successful outcomes. However, despite its effectiveness, the system faces inefficiencies in resource utilization, leading to a high rate of premature build aborts and delays in landing smaller changes blocked by larger conflicting ones. This paper introduces enhancements to SubmitQueue, focusing on optimizing resource usage and improving build prioritization. Central to this is our innovative probabilistic model, which distinguishes between changes with shorter and longer build times to prioritize builds for more efficient scheduling. By leveraging a machine learning model to predict build times and incorporating this into the probabilistic framework, we expedite the landing of smaller changes blocked by conflicting larger time-consuming changes. Additionally, introducing a concept of speculation threshold ensures that only the most likely builds are executed, reducing unnecessary resource consumption. After implementing these enhancements across Uber's major monorepos (Go, iOS, and Android), we observed a reduction in Continuous Integration (CI) resource usage by approximately 53\%, CPU usage by 44\%, and P95 waiting times by 37\%. These improvements highlight the enhanced efficiency of SubmitQueue in managing large-scale software changes while maintaining a green mainline.

Tags: "Testing and Quality", "AI for SE", "Process"

Shengnan Wu, Yongxiang Hu, Jiazhen Gu, Penglei Mao, Jin Meng, Liujie Fan, Zhongshi Luan, Xin Wang, Yangfan Zhou, "Testing False Recalls in E-commerce Apps: a User-perspective Blackbox Approach"

Abstract: Search components are essential in e-commerce apps, allowing users to find products and services. However, they often suffer from bugs, leading to false recalls, \textit{i.e.}, irrelevant search results. Detecting false recalls automatically is challenging. As users and shop owners adopt ambiguous natural language to describe their purchasing intentions and products, precise relevance determination becomes difficult. We propose \textbf{f}alse \textbf{r}ecall \textbf{H}ound (frHound), a black box testing approach targeting false recalls. The core idea of \MIG is to mimic users' online purchasing behavior. Specifically, frHound first designs 37 features to align with how users process information during online shopping, explored by a comprehensive user study. Then, frHound uses an outlier detection technique to identify the most divergent search results, similar to how general users make purchasing decisions during online shopping. Those divergent search results are likely false recalls, as most search results are relevant during e-commerce searches. Experiments with real industry data show frHound reduces human labor, time, and financial costs associated with discovering false recalls by 36.74 times. In a seven-month trial with \textit{M-app}, a popular Chinese e-commerce platform, frHound identified 1282 false recalls, improving user satisfaction and reducing false recall discovery costs.

Tags: "User experience", "Business"

daeha ryu, seokjun ko, eunbi jang, jinyoung park, myunggwan kim, changseo park, "SEMANTIC CODE FINDER: An Efficient Semantic Search Framework for Large-Scale Codebases"

Abstract: We present SEMANTIC CODE FINDER, a framework for semantic code search that delivers high-level search performance and supports multiple programming languages. Leveraging code summaries, it enables meaningful semantic code search by extracting the core content of code methods and using this information for search queries. Evaluated on large-scale codebases, SEMANTIC CODE FINDER demonstrates its effectiveness in outperforming existing open-source code search tools, achieving higher recall and precision rates. It delivers superior search performance across Java, Python, and C++. Notably, SEMANTIC CODE FINDER outperforms CodeMatcher, a previously successful semantic code search tool, by approximately 41\% in terms of MRR. Moreover, it shows consistent performance across Java, Python, and C++ languages, highlighting its robustness and effectiveness. Currently, it is being used as a code search service for a significant amount of source code within Samsung Electronics, meeting the needs of its developers.

Tags: "Analysis", "Prog Comprehension/Reeng/Maint", "AI for SE"

Yongxiang Hu, Yu Zhang, Xuan Wang, Yingjie Liu, Shiyu Guo, Chaoyi Chen, Xin Wang, Yangfan Zhou, "KuiTest: Leveraging Knowledge in the Wild as GUI Testing Oracle for Mobile Apps"

Abstract: In industrial practice, UI (User Interface) functional bugs typically manifest as inconsistent UI input and corresponding response. Such bugs can deteriorate user experiences and are, therefore, a major target of industrial testing practice. For a long time, testing for UI functional bugs has relied on rule-based methods, which are labor-intensive for rule development and maintenance. Given that the UI functional bugs typically manifest where an app’s response deviates from the user’s expectations, we proposed the key point of reducing human efforts lies in simulating human expectations. Due to the vast in-the-wild knowledge of large language models (LLMs), they are well-suited for this simulation. By leveraging LLMs as UI testing oracle, we proposed KuiTest, the first rule-free UI functional testing tool we designed for Company M, one of the largest E-commerce app providers serving over 600 million users. KuiTest can automatically predict the effect of UI inputs and verify the post-interaction UI response. We evaluate the design of KuiTest via a set of ablation experiments. Moreover, real-world deployments demonstrate that KuiTest can effectively detect previously unknown UI functional bugs and significantly improve the efficiency of GUI testing.

Tags: "User experience", "AI for SE", "Testing and Quality"

Umut Cihan, Vahid Haratian, Arda İçöz, Mert Kaan Gül, Ömercan Devran, Emircan Furkan Bayendur, Baykal Mehmet Uçar, Eray Tuzun, "Automated Code Review In Practice"

Abstract: Context: Code review is a widespread practice among practitioners to improve software quality and transfer knowledge. It is often perceived as time-consuming due to the need for manual effort and potential delays in the development process. Several AI-assisted code review tools (Qodo, GitHub Copilot, Coderabbit, etc.) provide automated code reviews using large language models (LLMs). The overall effects of such tools in the industry setting are yet to be examined. Objective: This study examines the impact of LLM-based automated code review tools in an industry setting. Method: The study was conducted within an industrial software development environment that adopted an AI-assisted code review tool (based on open-source Qodo PR Agent). Approximately 238 practitioners across ten projects had access to the tool. We focused our analysis on three projects, encompassing 4,335 pull requests, of which 1,568 underwent automated reviews. Our data collection comprised three primary sources: (1) a quantitative analysis of pull request data, including comment labels indicating whether developers acted on the automated comments, (2) surveys sent to developers regarding their experience with the reviews on individual pull requests, and (3) a broader survey of 22 practitioners capturing their general opinions on automated code reviews. Results: \%73.8 of automated code review comments were labeled as resolved. However, the overall average pull request closure duration increased from five hours 52 minutes to eight hours 20 minutes, with varying trends observed across different projects. According to survey responses, most practitioners observed a minor improvement in code quality as a result of automated code reviews. Conclusion: Our findings indicate that the tool was useful for software development. Additionally, developers highlighted several advantages, such as accelerated bug detection, increased awareness of code quality, and the promotion of best practices. However, it also led to longer pull request closure times. Developers noted several disadvantages, including faulty reviews, unnecessary corrections, and overly frequent or irrelevant comments. Based on our findings, we discussed how practitioners can more effectively utilize automated code review technologies.

Tags: "Testing and Quality", "AI for SE"

Anton Semenkin, Vitaliy Bibaev, Yaroslav Sokolov, Kirill Krylov, Alexey Kalina, Anna Khannanova, Danila Savenkov, Darya Rovdo, Igor Davidenko, Kirill Karnaukhov, Maxim Vakhrushev, Mikhail Kostyukov, Mikhail Podvitskii, Petr Surkov, Yaroslav Golubev, Nikita Povarov, Timofey Bryksin, "Full Line Code Completion: Bringing AI to Desktop"

Abstract: In recent years, several industrial solutions for the problem of multi-token code completion appeared, each making a great advance in the area but mostly focusing on cloud-based runtime and avoiding working on the end user's device. In this work, we describe our approach for building a multi-token code completion feature for the JetBrains' IntelliJ Platform, which we call \textit{Full Line Code Completion}. The feature suggests only syntactically correct code and works fully locally, \textit{i.e.}, data querying and the generation of suggestions happens on the end user's machine. We share important time and memory-consumption restrictions, as well as design principles that a code completion engine should satisfy. Working entirely on the end user's device, our code completion engine enriches user experience while being not only fast and compact but also secure. We share a number of useful techniques to meet the stated development constraints and also describe offline and online evaluation pipelines that allowed us to make better decisions. Our online evaluation shows that the usage of the tool leads to 1.3 times more Python code in the IDE being produced by code completion. The described solution was initially started with a help of researchers and was then bundled into all JetBrains IDEs where it is now used by millions of users. Thus, we believe that this work is useful for bridging academia and industry, providing researchers with the knowledge of what happens when complex research-based solutions are integrated into real products.

Tags: "IDEs"

Matteo Esposito, Francesco Palagiano, Valentina Lenarduzzi, Davide Taibi, "On Large Language Models in Mission-Critical IT Governance: Are We Ready Yet?"

Abstract: \textit{Context}. The security of critical infrastructure has been a fundamental concern since the advent of computers, and this concern has only intensified in today’s cyber warfare landscape. Protecting mission-critical systems (MCSs), including essential assets like healthcare, telecommunications, and military coordination, is vital for national security. These systems require prompt and comprehensive governance to ensure their resilience, yet recent events have shown that meeting these demands is increasingly challenging. \textit{Aim}. Building on prior research that demonstrated the potential of GAI, particularly Large Language Models (LLMs), in improving risk analysis tasks, we aim to explore practitioners' perspectives, specifically developers and security personnel, on using generative AI (GAI) in the governance of IT MCSs seeking to provide insights and recommendations for various stakeholders, including researchers, practitioners, and policymakers. \textit{Method.} We designed a survey to collect practical experiences, concerns, and expectations of practitioners who develop and implement security solutions in the context of MCSs. Analyzing this data will help identify key trends, challenges, and opportunities for introducing GAIs in this niche domain. \textit{Conclusions and Future Works}. Our findings highlight that the safe use of LLMs in MCS governance requires interdisciplinary collaboration. Researchers should focus on designing regulation-oriented models and focus on accountability; practitioners emphasize data protection and transparency, while policymakers must establish a unified AI framework with global benchmarks to ensure ethical and secure LLMs-based MCS governance.

Tags: "SE for AI", "Security", "Real-Time"

Gokul Rejithkumar, Preethu Rose Anish, "NICE: Non-Functional Requirements Identification, Classification, and Explanation Using Small Language Models"

Abstract: Accurate identification and classification of Non-Functional Requirements (NFRs) is essential for informed architectural decision-making and maintaining software quality. Numerous language model-based techniques have been proposed for NFR identification and classification. However, understanding the reasoning behind the classification outputs of these techniques remains challenging. Rationales for the classification outputs of language models enhance comprehension, aid in debugging the models, and build confidence in the classification outputs. In this paper, we present NICE, a tool for NFR Identification, Classification, and Explanation. Using an industrial requirements dataset, we generated explanations in natural language using the GPT-4o large language model (LLM). We then fine-tuned small language models (SLMs), including T5, Llama 3.1, and Phi 3, with these LLM-generated explanations to identify and classify NFRs and to explain their classification outputs. We evaluated NICE using standard evaluation metrics such as F1-score and human evaluation to assess the quality of the generated explanations. Among the models tested, T5 produced explanations of quality comparable to Llama 3.1 and Phi 3 but achieved the highest average F1-score of 0.90 in multi-label NFR classification on the industrial requirements dataset. Furthermore, to evaluate the effectiveness of NICE, a survey was conducted with 20 requirements analysts and software developers. NICE is currently deployed as a part of the Knowledge-assisted Requirements Evolution (K-RE) framework developed by a large IT vendor organization.

Tags: "Requirements", "AI for SE"

Jinquan Zhang, Zihao Wang, Pei Wang, Rui Zhong, Dinghao Wu, "FlatD: Protecting Deep Neural Network Program from Reversing Attacks"

Abstract: The emergence of deep learning (DL) compilers provides automated optimization and compilation across DL frameworks and hardware platforms, which enhances the performance of AI service and primarily benefits the deployment to edge devices and low-power processors. However, DNN programs generated from DL compilers introduce a new attack interface. They are targeted by new model extraction attacks that can fully or partially rebuild the DNN model by reversing the DNN programs. Unfortunately, no defense countermeasure is designed to hinder this kind of attack. To address the issue, we investigate all the state-of-the-art reversing-based model extraction attacks and identify an essential component shared across the frameworks. Based on this observation, we propose FlatD, the first defense framework for DNN programs toward reversing-based model extraction attacks. FlatD manipulates and conceals the original control flow graph (CFG) of DNN programs based on control flow flattening (CFF). Unlike traditional CFF, FlatD ensures the DNN programs are challenging for attackers to recover their CFG and gain necessary information statically. Our evaluation shows that, compared to the traditional CFF (O-LLVM), FlatD provides more effective and stealthy protection to DNN programs with similar performance and less scale.

Tags: "SE for AI", "Security", "Analysis"

Kristóf Umann, Zoltán Porkoláb, "Towards Better Static Analysis Bug Reports in the Clang Static Analyzer"

Abstract: Static analysis is a method increasingly used for finding bugs and other deviations of software systems. While it fits well for the modern development environment and is capable to catch issues earlier than the manual code review or the various testing methodologies, human experts are still deeply involved in the evaluation of whether the tool reported a true finding or a false positive. This creates a serious bottleneck in the development flow. While many efforts have been made to improve the accuracy of the tools, little attention has been paid to the quality of the reports. Improving the report quality could improve the decision on possible false positives, shorten the bug fixing time and enhance the trust in static analysis tools. In this paper we report our research to find the most important attributes for generating clear and concise bug reports for the Clang Static Analyzer tool. With the help of experienced developers, we evaluated several test cases synthetized from real-world examples and analyzed how they rated the report elements according to the aspects of understanding. The results show that current reports from the Clang Static Analyzer, one of the most popular static analysis tools, can flood the developers with unimportant information while some of the report generation steps may eliminate relevant code parts. Our measurement methodology and results could be used to improve bug report quality, and therefore enhance the application of the tools. Despite our study focusing on one specific tool, the lessons learned could be used for a research targeting a wider range of static analyzers. Based on our findings, we made suggestions to the developers of the Clang Static Analyzer and an improvement to the bug report generation was made and is already available in version 19.0.0.

Tags: "Analysis", "Testing and Quality", "IDEs"

Chi Xiao, Daniel Ståhl, Jan Bosch, "UML Sequence Diagram Generation: A Multi-Model, Multi-Domain Evaluation"

Abstract: The automation of UML sequence diagram generation has posed a persistent challenge in software engineering, with existing approaches relying heavily on manual processes. Recent advancements in natural language processing, particularly through large language models, offer promising solutions for automating this task. This paper investigates the use of large language models in automating the generation of UML sequence diagrams from natural language requirements. We evaluate three state-of-the-art large language models, GPT 4o, Mixtral 8x7B, and Llama 3.1 8B, across multiple datasets, including both public and proprietary requirements, to assess their performance in terms of correctness, completeness, clarity, and readability. The results indicate GPT 4o consistently outperforms the other models in most metrics. Our findings highlight the potential of large language models to streamline requirements engineering by reducing manual effort, although further refinement is needed to enhance their performance in complex scenarios. This study provides key insights into the strengths and limitations of these models, and offers practical guidance for their application, advancing the understanding of how large language models can support automation in software engineering tasks.

Tags: "AI for SE", "Analysis", "Design/Architecture"

Claudio Martens, Hammam Abdelwahab, Katharina Beckh, Birgit Kirsch, Vishwani Gupta, Dennis Wegener, Steffen Hoh, "Leveraging MLOps: Developing a Sequential Classification System for RFQ Documents in Electrical Engineering"

Abstract: Vendors participating in tenders face significant challenges in creating timely and accurate order quotations from Request for Order (RFQ) documents. The success of their bids is heavily dependent on the speed and precision of these quotations. A key bottleneck in this process is the time-consuming task of identifying relevant products from the product catalogue that align with the RFQ descriptions. We propose the implementation of an automatic classification system that utilizes a context-aware language model specifically designed for the electrical engineering domain. Our approach aims to streamline the identification of relevant products, thereby enhancing the efficiency and accuracy of the quotation process. However, an effective solution must be scalable and easily adjustable. Thus, we present a machine learning operations (MLOps) architecture that facilitates automated training and deployment. We pay particular attention to automated pipelines, which are essential for the operation of a maintainable ML solution. Furthermore, we outline best practices for creating production-ready pipelines and encapsulating data science efforts. Schneider Electric currently operates the solution presented in this paper.

Tags: "Business", "AI for SE", "Design/Architecture"

Rangeet Pan, Myeongsoo Kim, Rahul Krishna, Raju Pavuluri, Saurabh Sinha, "ASTER: Natural and Multi-language Unit Test Generation with LLMs"

Abstract: Implementing automated unit tests is an important but time consuming activity in software development. To support developers in this task, software engineering research over the past few decades has developed many techniques for automating unit test generation. However, despite this effort, usable tools exist for very few programming languages. Moreover, studies have found that automatically generated tests suffer poor readability and often do not resemble developer-written tests. In this work, we present a rigorous investigation of how large language models (LLMs) can help bridge the gap. We describe a generic pipeline that incorporates static analysis to guide LLMs in generating compilable and high-coverage test cases. We illustrate how the pipeline can be applied to different programming languages, specifically Java and Python, and to complex software requiring environment mocking. We conducted an empirical study to assess the quality of the generated tests in terms of code coverage and test naturalness---evaluating them on standard as well as enterprise Java applications and a large Python benchmark. Our results demonstrate that LLM-based test generation, when guided by static analysis, can be competitive with, and even outperform, state-of-the-art test-generation techniques in coverage achieved while also producing considerably more natural test cases that developers find easy to read and understand. We also present the results of a user study, conducted with 161 professional developers, that highlights the naturalness characteristics of the tests generated by our approach.

Tags: "Testing and Quality", "AI for SE"

Chandra Maddila, Negar Ghorbani, Kosay Jabre, Vijayaraghavan Murali, Edwin Kim, Parth Thakkar, Nikolay Pavlovich Laptev, Olivia Harman, Diana Hsu, Rui Abreu, Peter Rigby, "AI-Assisted SQL Authoring at Industry Scale"

Abstract: SqlCompose brings generative AI into the data analytics domain. SQL is declarative, has formal table schemas, and is often written in a non-linear manner. We address each of these challenges and develop a set of models that shows the importance of each problem. We first develop an internal SQL benchmark to perform offline tests at Meta. We evaluate how well the Public Llama model performs. We attain a BLEU score of 53% and 24% for single- and multi-line predictions, respectively. This performance is consistent with prior works on imperative languages. We then fine-tune Llama on our internal data and database schemas. SC-Schema substantially outperforms Llama by 16 percentage points on BLEU score. SQL is often written with multiple sub queries and in a non-sequential manner. We develop SC-FIM which is aware of the context before and after the line(s) that need to be completed. This fill-in-the-middle model outperform SC-FIM by 35 percentage points. We also measure how often the models get the correct table names, and SC-FIM is able to do this 75% of the time a major improvement over the other two models. Aside from our scientific research, we also roll out SC-FIM at Meta. SqlCompose is used on a weekly basis by over 10k users including data scientists and software engineers, less than 1% of users have disabled SqlCompose. We use the feedback from users to improve SqlCompose. Interesting positive themes include completing tedious or repetitive SQL clauses, suggesting boilerplate coding, and help in eliminate the need to remember difficult SQL syntax. The most significant negative themes was table and column name hallucinations, which has been reduced with the release of SC-FIM. The SqlCompose models consistently outperform public and internal LLMs despite their smaller size (7 bn and 13 bn), which provides early indications that smaller specialist models can outperform larger general purpose models.

Tags: "Databases", "AI for SE"

Rui Abreu, Vijayaraghavan Murali, Peter Rigby, Chandra Maddila, Weiyan Sun, Jun Ge, Kaavya Chinniah, Audris Mockus, Megh Mehta, Nachiappan Nagappan, "Moving Faster and Reducing Risk: Using LLMs in Release Deployment"

Abstract: Release engineering has traditionally focused on continuously delivering features and bug fixes to users, but at a certain scale, it becomes impossible for a release engineering team to determine what should be released. At Meta’s scale, the responsibility appropriately and necessarily falls back on the engineer writing and reviewing the code. To address this challenge, we developed models of diff risk scores (DRS) to determine how likely a diff is to cause a SEV, i.e., a severe fault that impacts end-users. Assuming that SEVs are only caused by diffs, a naive model could randomly gate X% of diffs from landing, which would automatically catch X% of SEVs on average. However, we aimed to build a model that can capture Y% of SEVs by gating X% of diffs, where Y >> X. By training the model on historical data on diffs that have caused SEVs in the past, we can predict the riskiness of an outgoing diff to cause a SEV. Diffs that are beyond a particular threshold of risk can then be gated. We have four types of gating: no gating (green), weekend gating (weekend), medium impact on end-users (yellow), and high impact on end-users (red). The input parameter for our models is the level of gating, and the outcome measure is the number of captured SEVs, i.e., the number of gated diffs that would have led to a SEV. Our research approaches include a logistic regression model, a BERT-based model, and generative LLMs. Our baseline regression model captures 18.7%, 27.9%, and 84.6% of SEVs while respectively gating the top 5% (weekend), 10% (yellow), and 50% (red) of risky diffs. The BERT-based model, StarBERT, only captures 0.61×, 0.85×, and 0.81× as many SEVs as the logistic regression for the weekend, yellow, and red gating zones, respectively. The generative LLMs, iCodeLlama-34B and iDiffLlama-13B, when risk-aligned, capture more SEVs than the logistic regression model in production: 1.40×, 1.52×, 1.05×, respectively.

Tags: "Testing and Quality", "AI for SE"

James Johnson, Julian Thome, Lucas Charles, Hua Yan, Jason Leasure, "A scalable, effective and simple Vulnerability Tracking approach for heterogeneous SAST setups based on Scope+Offset"

Abstract: Managing software projects using Source Control Management (SCM) systems like Git, combined with automated security testing in Continuous Integration and Continuous Delivery (CI/CD) processes, is a best practice in today’s software industry. These processes continuously monitor code changes to detect security vulnerabilities as early as possible. Security testing often involves multiple Static Application Security Testing (SAST) tools, each specialized in detecting specific vulnerabilities, such as hardcoded passwords or insecure data flows. A heterogeneous SAST setup, using multiple tools, helps minimize the software’s attack surface. The security findings from these tools undergo Vulnerability Management, a semi-manual process of understanding, categorizing, storing, and acting on them. Code volatility, i.e., the constant change of the project’s source code, as well as double reporting, i.e., the overlap of findings reported by multiple tools, are potential sources of duplication imposing futile auditing effort on the analyst. Vulnerability Tracking is an automated process that helps deduplicating and tracking vulnerabilities throughout the lifetime of a software project. We propose a scalable Vulnerability Tracking approach called Scope+Offset for heterogeneous SAST setups that reduces the noise introduced by code volatility as well as code duplication. Our proposed, fully automated method proved to be highly effective in an industrial setting, reducing the negative effect of duplication by approximately 30% which directly translates to a reduction in futile auditing time while inducing a negligible performance overhead. Since its product integration into GitLab in 2022, Scope+Offset provided vulnerability tracking to the thousands of security scans running on the GitLab DevSecOps platform every day where the GitLab DevSecOps platform can be considered as a heterogeneous SAST setup as it includes a variety of different SAST tools.

Tags: "Security", "Process", "MSR"

Soumiki Chattopadhyay, Amreeta Chatterjee, Puja Agarwal, Bianca Trinkenreich, Swarathmika Kumar, Rohit Ranjan Rai, Resham Chugani, Pragya Kumari, Margaret Burnett, Anita Sarma, "Systematizing Inclusive Design in MOSIP: An Experience Report"

Abstract: The GenderMag method has been successfully used by software teams to improve inclusivity in their software products across various domains. Given the success of this method, here we investigate how GenderMag can be systematically adopted in an organization. It is a conceptual replication of our prior work that identified a set of practices and pitfalls synthesized across different USA-based teams. Through Action Research, we trace the 3+ years long journey of GenderMag adoption in the MOSIP organization; starting from the initial ‘unfreeze’ stage to the institutionalization (‘re-freeze’) of GenderMag in the organization’s processes. Our findings identify: (1) which practices from the prior work could be generalized and how some of them had to be modified to fit MOSIP organization’s context (Digital Public Goods, open-source product, and fully remote work environment), and (2) the pitfalls that occurred.

Tags: "Human/Social", "User experience"

Ita Ryan, Utz Roedig, Klaas-Jan Stol, "'ImmediateShortTerm3MthsAfterThatLOL'': Developer Secure-Coding Sentiment, Practice and Culture in Organisations"

Abstract: As almost all areas of human endeavour undergo rapid digital transformation, secure coding is increasingly important to personal, commercial and national security. Yet studies have shown that software developers do not always prioritise or even understand security. Our large survey of organically sourced coders (n=863) examines how software developers currently experience secure coding in the workplace. We found that developers express an interest in secure coding, display basic security knowledge, and turn to their managers and teams first for help with security concerns. We found that developer secure coding traits and security practice do not correlate with organisational statistics such as size, but do correlate weakly with measures of security culture, and in some cases with practice, indicating that organisational security support goes hand-in-hand with secure development. Investigating the effects of code breaches, we found that for almost half of cases, code security does not increase, or increases only for a short time.

Tags: "Security", "Human/Social", "Process"

Yilun Liu, Yuhe Ji, Shimin Tao, Minggui He, Weibin Meng, Shenglin Zhang, Yongqian Sun, Yuming Xie, Boxing Chen, Hao Yang, "LogLM: From Task-based to Instruction-based Automated Log Analysis"

Abstract: Automatic log analysis is essential for the efficient Operation and Maintenance (O\&M) of software systems, providing critical insights into system behaviors. However, existing approaches mostly treat log analysis as training a model to perform an isolated task, using task-specific log-label pairs. These task-based approaches are inflexible in generalizing to complex scenarios, depend on task-specific training data, and cost significantly when deploying multiple models. In this paper, we propose an instruction-based training approach that transforms log-label pairs from multiple tasks and domains into a unified format of instruction-response pairs. Our trained model, LogLM, can follow complex user instructions and generalize better across different tasks, thereby increasing flexibility and reducing the dependence on task-specific training data. By integrating major log analysis tasks into a single model, our approach also relieves model deployment burden. Experimentally, LogLM outperforms existing approaches across five log analysis capabilities, and exhibits strong generalization abilities on complex instructions and unseen tasks. In addition, LogLM has been deployed on an O\&M platform of Huawei, managing 11165 devices and processing an average of 209 user instructions per day.

Tags: "Analysis", "AI for SE"

Niklas Beck, Benny Stein, Dennis Wegener, Lennard Helmer, "Evaluation of Tools and Frameworks for Machine Learning Model Serving"

Abstract: Machine learning (ML) models are ubiquitous, as ML expands into numerous application domains. Despite this growth, the software engineering of ML systems remains complex, particularly in production environments. Serving ML models for inference is a critical part, as it provides the interface between the model and the surrounding components. Today, a variety of open source tools and frameworks for model serving exists which promise ease of use and performance. However, they differ in terms of usability, flexibility, scalability, and their overall performance. In this work, we systematically evaluate several popular model serving tools and frameworks in the context of a natural language processing scenario. In detail, we analyze their features and capabilities, conduct runtime experiments, and report on our experiences from various real-world ML projects. Our evaluation results provide valuable insights and considerations for ML engineers and other practitioners seeking for effective serving environments that seamlessly integrate with the existing ML tech stack, simplifying and accelerating the process of serving ML models in production.

Tags: "SE for AI"

Mar Zamorano, Daniel Blasco, Carlos Cetina, Federica Sarro, "Video Game Procedural Content Generation Through Software Transplantation"

Abstract: Software transplantation generates new piece of software by reusing existing parts from a given piece of software (i.e., host) to enhance other parts of a same software or different software (i.e., donor). To date, its use has been proved beneficial for traditional software systems. In this paper, we argue that software transplantation can be used for automatically producing video game procedural content. We assessed the feasibility of our idea by realising the first search-based algorithm for procedural content transplantation and empirically evaluating it in an industrial case study in collaboration with the developers of the commercial video game Kromaia. Specifically, our proposed approach, dubbed IMHOTEP, enables developers to choose what video-game content to transplant and where, and automatically searches for an appropriate solution to integrate the organ into the host. Such a search is performed by using an evolutionary algorithm guided by a simulation-based fitness function, which is novel w.r.t previous transplantation work generally guided by test-suite compliance. We empirically evaluate the effectiveness of IMHOTEP to transplant procedural content, specifically non-playable characters, for the commercial video game Kromaia and benchmarked it against a state-of-the-art approach in search-based procedural content generation, as well as a variant of IMHOTEP itself guided by a test-suite-based fitness function. We found that using IMHOTEP, Kromaia developers were able to transplant 129 distinct organs taken from the game’s scenarios into five different hosts, thus generating a total of 645 new transplanted non-playable characters for this game. Moreover, we found that the game content generated by using IMHOTEP was 1.5 times superior than the one obtained by using its test-suite-based variant, and 2.5 times superior than the one generated by the state-of-the-art benchmark. Furthermore, the transplants generated by IMHOTEP have also unveiled organ interactions that had not been previously identified in the literature. Finally, a focus group with game developers indicated their satisfaction with the content generated by IMHOTEP and willingness to use it in their game development activity. The positive results obtained by IMHOTEP, prove the viability of Procedural Content Transplantation and open up new research avenues for automated video-game content generation as well as for software transplantation.

Tags: "Design/Architecture", "Games", "Analysis"

Lekshmi Murali Rani, Faezeh Mohammadi, Robert Feldt, Richard Berntsson Svensson, "An Empirical Study on Decision-Making Aspects in Responsible Software Engineering for AI"

Abstract: Incorporating responsible practices into Software Engineering (SE) for Artificial Intelligence (AI) is essential to ensure ethical principles, societal impact, and accountability remain at the forefront of AI system design and deployment. This study investigates the ethical challenges and complexities inherent in Responsible SE (RSE) for AI, underscoring the need for practical, scenario-driven operational guidelines. Given the complexity of AI and the relative inexperience of professionals in this rapidly evolving field, continuous learning and market adaptation are crucial. Through qualitative interviews with seven practitioners (conducted until saturation), quantitative surveys of 51 practitioners and static validation of results with four industry experts in AI, this study explores how personal values, emerging roles, and awareness of AI’s societal impact influence responsible decision-making in RSE for AI. A key finding is the gap between the current state of the art and actual practice in RSE for AI, particularly in the failure to operationalize ethical and responsible decision-making within the SE lifecycle for AI. While ethical issues in RSE for AI largely mirror those found in broader SE process, the study highlights a distinct lack of operational frameworks and resources to guide RSE practices for AI effectively. The results reveal that current ethical guidelines are insufficiently implemented at the operational level, reinforcing the complexity of embedding ethics throughout the software engineering life cycle. The study concludes that interdisciplinary collaboration, H-shaped competencies (Ethical-Technical dual competence), and a strong organizational culture of ethics are critical for fostering RSE practices for AI, with a particular focus on transparency and accountability.

Tags: "SE for AI", "Human/Social", "Resp SE and Ethics"

Elise Paradis, Kate Grey, Quinn Madison, Daye Nam, Andrew Macvean, Nan Zhang, Ben Ferrari-Church, Satish Chandra, "How much does AI impact development speed? An enterprise-based randomized controlled trial"

Abstract: How much does AI assistance impact developer productivity? To date, the software engineering literature has provided a range of answers, targeting a diversity of outcomes: from perceived productivity to speed on task and developer throughput. Our randomized controlled trial with 96 full-time Google software engineers contributes to this literature by sharing an estimate of the impact of three AI features on the time developers spent on a complex, enterprise-grade task. We found that AI significantly shortened the time developers spent on task. Our best estimate of the size of this effect, controlling for factors known to influence developer time on task, stands at about 21\%, although our confidence interval is large. We also found an interesting effect whereby developers who spend more hours on code-related activities per day were faster with AI. Product and future research considerations are discussed. In particular, we invite further research that explores the impact of AI at the ecosystem level and across multiple suites of AI-enhanced tools, since we cannot assume that the effect size obtained in our lab study will necessarily apply more broadly, or that the effect of AI found using internal Google tooling in the summer of 2024 will translate across tools and over time.

Tags: "AI for SE", "Process"

Behrooz Omidvar Tehrani, Ingo Gühring, Gauthier Guinet, Linbo Liu, Hoan Nguyen, Talha Oz, Qiang Zhou, Laurent Callot, Jun Huan, Omer Tripp, Anoop Deoras, "Migrating Java Applications with Amazon Q Developer Agent for Code Transformation"

Abstract: Software maintenance, particularly migrating Java applications from older JVM version to newer ones, presents significant challenges for developers due to the rapid evolution of programming languages and increasing reliance on third-party libraries. In this paper we introduce Amazon Q Developer Agent, a novel system for fully automated Java migrations. The system combines rule-based approaches, web-mined knowledge, and large language models (LLMs) to offer a comprehensive solution for Java upgrades. Amazon Q Developer Agent addresses key challenges in the migration process, including dependency updates and deprecated API replacements, while seamlessly integrating with popular integrated development environments (IDEs). The system significantly reduces the time and effort required for Java migrations, with case studies showing upgrades from Java 8 and 11 to 17 being completed in hours rather than the typical 50 developer-days. In a large-scale application, Amazon engineers successfully upgraded 30,000 internal Java 8 applications to Java 17 in less than a year using an internal version of this tooling. This paper details the system's architecture, methodology, and presents experimental results demonstrating its effectiveness in revolutionizing the software migration process, saving thousands of developer-years in maintenance efforts.

Tags: "Prog Comprehension/Reeng/Maint", "AI for SE", "Design/Architecture"

Haonan Chen, Daihang Chen, Yizhuo Yang, Lingyun Xu, Liang Gao, Mingyi Zhou, Chunming Hu, Li Li, "ArkAnalyzer: The Static Analysis Framework for OpenHarmony"

Abstract: ArkTS is a new programming language dedicated to developing apps for the emerging OpenHarmony mobile operating system. Like other programming languages (e.g., Typescripts) constantly suffering from performance-related code smells or vulnerabilities, the ArkTS programming language will likely encounter the same problems. The solution given by our research community is to invent static analyzers, which are often implemented on top of a common static analysis framework, to detect and subsequently repair those issues automatically. Unfortunately, such an essential framework is not available for the OpenHarmony community yet. Existing program analysis methods have several problems when handling the ArtTS code. To bridge the gap, we design and implement a framework named ArkAnalyzer and make it publicly available as an open-source project. Our ArkAnalyzer addresses the aforementioned problems and has already integrated a number of fundamental static analysis functions (e.g., control-flow graph constructions, call graph constructions, etc.) that are ready to be reused by developers to implement OpenHarmony app analyzers focusing on statically resolving dedicated issues such as performance bug detection, privacy leaks detection, compatibility issues detection, etc. Experiment results show that our ArkAnalyzer achieves both high analyzing efficiency and high effectiveness. In addition, we open-sourced the dataset that has numerous real-world ArkTS Apps.

Tags: "Analysis"

Haoran Xu, Chen Zhi, Tianyu Xiang, Zixuan Wu, Gaorong Zhang, Xinkui Zhao, Jianwei Yin, Shuiguang Deng, "Prioritizing Large-scale Natural Language Test Cases at OPPO"

Abstract: Regression testing is a crucial process for ensuring system stability following software updates. As a global leader in smart device manufacturing, OPPO releases a new version of its customized Android firmware, ColorOS, on a weekly basis. Testers must select test cases from a vast repository of manual test cases for regression testing. The tight schedule makes it difficult for testers to select the correct test cases from this extensive pool. Since these test cases are described in natural language, testers must manually execute them according to the operational steps, making the process labor-intensive and error-prone. Therefore, an effective test case recommendation system is needed to suggest appropriate test cases, reducing unnecessary human effort during weekly regression tests. To address these challenges, we propose a two-phase manual test case recommendation system. Our system first uses the BERT model to classify commit message, determining the most relevant test labels. Then, it employs the BGE embedding model to compute the semantic similarity between the commit message and the test cases, recommending the most suitable test cases. This approach has been practically deployed within OPPO, and feedback from several months of use shows that our test case recommendation accuracy reaches 91\%. The time testers spend selecting test cases has decreased by 61\%, the number of test cases executed per code change has dropped by 87\%, and the defect detection rate of the recommended test cases has increased by 182.35\%. Our method achieves high accuracy, low human effort, and a high defect detection rate. This paper introduces the integration of the BERT classification model and the BGE semantic similarity model in the context of manual test case recommendation, significantly improving the accuracy and efficiency of test case recommendations and providing valuable insights for regression testing in complex software systems.

Tags: "Testing and Quality", "AI for SE"

Mohammad Saiful Islam, Mohamed Sami Rakha, William Pourmajidi, Janakan Sivaloganathan, John Steinbacher, Andriy Miranskyy, "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset"

Abstract: As Large-Scale Cloud Systems (LCS) become increasingly complex, effective anomaly detection is critical for ensuring system reliability and performance. However, there is a shortage of large-scale, real-world datasets available for benchmarking anomaly detection methods. To address this gap, we introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console. This dataset comprises 39,365 rows and 117,448 columns of telemetry data. Additionally, we demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process. This study and the accompanying dataset provide a resource for researchers and practitioners in cloud system monitoring. It facilitates more efficient testing of anomaly detection methods in real-world data, helping to advance the development of robust solutions to maintain the health and performance of large-scale cloud infrastructures.

Tags: "Analysis", "AI for SE", "Testing and Quality"

Tri Minh-Triet Pham, Karthikeyan Premkumar, Mohamed Naili, Jinqiu Yang, "Time to Retrain? Detecting Concept Drifts in Machine Learning Systems"

Abstract: With the boom of machine learning (ML) tech- niques, software practitioners build ML systems to process the massive volume of streaming data for diverse software engineering tasks such as failure prediction in AIOps. Trained using historical data, such ML models encounter performance degradation caused by concept drift, i.e., data and inter-relationship (concept) changes between training and production. It is essential to use concept rift detection to monitor the deployed ML models and re-train the ML models when needed. In this work, we explore applying state-of-the-art (SOTA) concept drift detection techniques on synthetic and real-world datasets in an industrial setting. Such an industrial setting requires minimal manual effort in labeling and maximal generality in ML model architecture. We find that current SOTA semi-supervised methods not only require significant labeling effort but also only work for certain types of ML models. To overcome such limitations, we propose a novel model-agnostic technique (CDSeer) for detecting concept drift. Our evaluation shows that CDSeer has better precision and recall compared to the state-of-the-art while requiring significantly less manual labeling. We demonstrate the effectiveness of CDSeer at concept drift detection by evaluating it on eight datasets from different domains and use cases. Results from internal deployment of CDSeer on an industrial proprietary dataset show a 57.1% improvement in precision while using 99% fewer labels compared to the SOTA concept drift detection method. The performance is also comparable to the supervised concept drift detection method, which requires 100% of the data to be labeled. The improved performance and ease of adoption of CDSeer are valuable in making ML systems more reliable.

Tags: "AI for SE", "Analysis"

Delaram Ghobari, Mohammad Hossein Amini, Dai Quoc Tran, Seunghee Park, Shiva Nejati, Mehrdad Sabetzadeh, "Test Input Validation for Vision-based DL Systems: An Active Learning Approach"

Abstract: Testing deep learning (DL) systems requires extensive and diverse, yet valid, test inputs. While synthetic test input generation methods, such as metamorphic testing, are widely used for DL testing, they risk introducing invalid inputs that do not accurately reflect real-world scenarios. Invalid test inputs can lead to misleading results. Hence, there is a need for automated validation of test inputs to ensure effective assessment of DL systems. In this paper, we propose a test input validation approach for vision-based DL systems. Our approach uses active learning to balance the trade-off between accuracy and the manual effort required for test input validation. Further, by employing multiple image-comparison metrics, it achieves better results in classifying valid and invalid test inputs compared to methods that rely on single metrics. We evaluate our approach using an industrial and a public-domain dataset. Our evaluation shows that our multi-metric, active learning-based approach produces several optimal accuracy-effort trade-offs, including those deemed practical and desirable by our industry partner. Furthermore, provided with the same level of manual effort, our approach is significantly more accurate than two state-of-the-art test input validation methods, achieving an average accuracy of 97%. Specifically, the use of multiple metrics, rather than a single metric, results in an average improvement of at least 5.4% in overall accuracy compared to the state-of-the-art baselines. Incorporating an active learning loop for test input validation yields an additional 7.5% improvement in average accuracy, bringing the overall average improvement of our approach to at least 12.9% compared to the baselines.

Tags: "Testing and Quality", "SE for AI"

Nihal Jain, Robert Kwiatkowski, Baishakhi Ray, Murali Krishna Ramanathan, Varun Kumar, "On Mitigating Code LLM Hallucinations with API Documentation"

Abstract: In this study, we address the issue of API hallucinations in various software engineering contexts. We introduce CloudAPIBench, a new benchmark designed to measure API hallucination occurrences. CloudAPIBench also provides annotations for frequencies of API occurrences in the public domain, allowing us to study API hallucinations at various frequency levels. Our findings reveal that Code LLMs struggle with low frequency APIs: for e.g., GPT-4o achieves only $38.58$\% valid low frequency API invocations. We demonstrate that Documentation Augmented Generation (DAG) significantly improves performance for low frequency APIs (increase to $47.94$\% with DAG) but negatively impacts high frequency APIs when using sub-optimal retrievers (a $39.02$\% absolute drop). To mitigate this, we propose to intelligently trigger DAG where we check against an API index or leverage Code LLMs' confidence scores to retrieve only when needed. We demonstrate that our proposed methods enhance the balance between low and high frequency API performance, resulting in more reliable API invocations ($8.20$\% absolute improvement on CloudAPIBench for GPT-4o).

Tags: "AI for SE", "Design/Architecture", "Analysis"

Xiaoyun Liang, Jingyi Ren, Jiayi Qi, Chao Peng, Bo Jiang, Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Ge Li, "aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing"

Abstract: Large Language Models (LLMs) have been widely used in code completion, and researchers are focusing on scaling up LLMs to improve their accuracy. However, larger LLMs will increase the response time of code completion and decrease the developers' productivity. In this paper, we propose a lightweight and effective LLM for code completion named aiXcoder-7B. Compared to existing LLMs, aiXcoder-7B achieves higher code completion accuracy while having smaller scales (i.e., 7 billion parameters). We attribute the superiority of aiXcoder-7B to three key factors: (1) Multi-objective training. We employ three training objectives, one of which is our proposed Structured Fill-In-the-Middle (SFIM). SFIM considers the syntax structures in code and effectively improves the performance of LLMs for code. (2) Diverse data sampling strategies. They consider inter-file relationships and enhance the capability of LLMs in understanding cross-file contexts. (3) Extensive high-quality data. We establish a rigorous data collection pipeline and consume a total of 1.2 trillion unique tokens for training aiXcoder-7B. This vast volume of data enables aiXcoder-7B to learn a broad distribution of code. We evaluate aiXcoder-7B in five popular code completion benchmarks and a new benchmark collected by this paper. The results show that aiXcoder-7B outperforms the latest six LLMs with similar sizes and even surpasses four larger LLMs (e.g., StarCoder2-15B and CodeLLaMa-34B), positioning aiXcoder-7B as a lightweight and effective LLM for academia and industry. Finally, we summarize three valuable insights for helping practitioners train the next generations of LLMs for code. aiXcoder-7B has been open-souced and gained significant attention. As of the submission date, aiXcoder-7B has received 2,193 GitHub Stars.

Tags: "AI for SE", "Design/Architecture"

Petr Tsvetkov, Aleksandra Eliseeva, Danny Dig, Alexander Bezzubov, Yaroslav Golubev, Timofey Bryksin, Yaroslav Zharov, "Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings"

Abstract: Commit message generation (CMG) is a crucial task in software engineering that is challenging to evaluate correctly. When a CMG system is integrated into the IDEs and other products at JetBrains, we perform online evaluation based on user acceptance of the generated messages. However, performing online experiments with every change to a CMG system is troublesome, as each iteration affects users and requires time to collect enough statistics. On the other hand, offline evaluation, a prevalent approach in the research literature, facilitates fast experiments but employs automatic metrics that are not guaranteed to represent the preferences of real users. In this work, we describe a novel way we employed to deal with this problem at JetBrains, by leveraging an online metric---the number of edits users introduce before committing the generated messages to the VCS---to select metrics for offline experiments. To support this new type of evaluation, we develop a novel markup collection tool mimicking the real workflow with a CMG system, collect a dataset with 57 pairs consisting of commit messages generated by GPT-4 and their counterparts edited by human experts, and design and verify a way to synthetically extend such a dataset. Then, we use the final dataset of 656 pairs to study how the widely used similarity metrics correlate with the online metric reflecting the real users' experience. Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation. This contradicts the previous studies on similarity metrics for CMG, suggesting that user interactions with a CMG system in real-world settings differ significantly from the responses by human labelers operating within controlled research environments. While our findings are tied to the CMG model we used, and the results may vary for models with cardinally different outputs, our proposed framework is relatively lightweight in terms of required human effort, and we release all the code and the dataset to support future research in the field of CMG: https://jb.gg/cmg-evaluation.

Tags: "Analysis", "Process", "AI for SE"

Spandan Garg, Roshanak Moghaddam, Neel Sundaresan, "Improving Code Performance Using LLMs in Zero-Shot: RAPGen"

Abstract: Performance bugs are non-functional bugs that can even manifest in well-tested commercial products. Fixing these performance bugs is an important yet challenging problem. In this work, we address this challenge and present a new approach called Retrieval-Augmented Prompt Generation (RAPGen). Given a code snippet with a performance issue, RAPGen first retrieves a prompt instruction from a pre-constructed knowledge-base of previous performance bug fixes and then generates a prompt using the retrieved instruction. It then uses this prompt on a Large Language Model (such as Codex) in zero-shot to generate a fix. We compare our approach with the various prompt variations and state of the art methods in the task of performance bug fixing. Our empirical evaluation shows that RAPGen can generate performance improvement suggestions equivalent or better than a developer in ~60% of the cases, getting ~42% of them verbatim, in an expert-verified dataset of past performance changes made by C# developers. Furthermore, we conduct an in-the-wild evaluation to verify the model's effectiveness in practice by suggesting fixes to developers in a large software company. So far, we have shared fixes on 10 codebases that represent production services running in the cloud and 7 of them have been accepted by the developers.

Tags: "AI for SE", "Testing and Quality"

Guilherme Vaz Pereira, Victoria Jackson, Rafael Prikladnicki, André van der Hoek, Luciane Fortes, Carolina Araújo, André Coelho, Ligia Chelli, Diego Ramos, "Exploring GenAI in Software Development: Insights from a Case Study in a Large Brazilian Company"

Abstract: Recent progress in Generative AI (GenAI) impacts different software engineering (ES) tasks in software development cycle, e.g., from code generation to program repair, and presents a promising avenue for enhancing the productivity of development teams. GenAI based tools have the potential to change the way we develop software and have received attention from industry and academia. However, although some studies have been addressing the adoption of these tools in the software industry, little is known about what are developers' real experiences in a professional software development context, aside the hype. In this paper, we explore the use of GenAI tools by a large Brazilian media company that has teams developing software in-house. We observed practitioners for six weeks and used online surveys at different time points to understand their expectations, perceptions, and concerns about these tools in their daily work. In addition, we automatically collected quantitative data from the company's development systems, aiming at getting insights about how GenAI impacts the development process during the period. Our results provide insights into how practitioners perceive and utilize GenAI in their daily work in software development.

Tags: "Human/Social", "AI for SE", "Analysis"

Charlotte Brandebusemeyer, Tobias Schimmer, Bert Arnrich, "Wearables to measure developer experience at work"

Abstract: Software development presents various challenges for developers: Programming, problem-solving and communication skills are required in multiple deadline-oriented projects running in parallel. These challenges can adversely affect developers’ mental well-being and productivity. So far, software developers’ well-being has mainly been examined via questionnaires or extensive and costly setups in laboratory settings. Objective, continuous and minimally invasive measurements with wearables in real-life settings are scarce and remain underutilized. This exploratory study uses a mixed-method approach to investigate the cognitive load component of developer experience and how it impacts professional SAP software developers in their everyday working context. Previous studies have demonstrated that physiological activity recorded by wearables can effectively measure cognitive load. In this study, thirty professional SAP software developers evaluated their developer experience in a questionnaire. Twenty of them additionally recorded physiological data during one working day using the Empatica E4 wristband. During the recording, they documented their working tasks and rated their experienced cognitive load during those tasks. The questionnaire data served as support for the physiological data recording. Key findings are that subjective cognitive load ratings positively correlate with electrodermal activity and skin temperature, and negatively with wrist acceleration. Three task categories (development-heavy, collaboration-heavy and other) could be discriminated based on subjective load ratings. Skin temperature and wrist acceleration differed significantly between development-heavy and other tasks. Subjective evaluations of the software developers in this study indicate that work environment factors which the individual software developer cannot influence negatively impact the developer experience. This study motivates the use of wearables in a firm context to enable a holistic evaluation of developer experience by combining subjective and objective measures in a real-world setting. Efficient task scheduling and detecting first signs of burnout could be enabled, which increases the software developer’s well-being and productivity and long-term reduces sick days and turnover rates in firms.

Tags: "Human/Social", "Process"

Ajay Krishna Vajjala, Arun Krishna Vajjala, Carmen Badea, Christian Bird, Rob DeLine, Jason Entenmann, Nicole Forsgren, Aliaksandr Hramadski, Sandeepan Sanyal, Oleg Surmachev, Thomas Zimmermann, Haris Mohammad, Jade D'Souza, Mikhail Demyanyuk, "Enhancing Differential Testing: LLM-Powered Automation in Release Engineering"

Abstract: In modern software engineering, efficient release engineering workflows are essential for quickly delivering new features to production. This not only improves company productivity but also provides customers with frequent updates, which can lead to increased profits. At Microsoft, we collaborated with the Identity and Network Access (IDNA) team to automate their release engineering workflows. They use differential testing to classify differences between test and production environments, which helps them assess how new changes perform with real-world traffic before pushing updates to production. This process enhances resiliency and ensures robust changes to the system. However, on-call engineers (OCEs) must manually label hundreds or thousands of behavior differences, which is time-consuming. In this work, we present a method leveraging Large Language Models (LLMs) to automate the classification of these differences, which saves OCEs a significant amount time. Our experiments demonstrate that LLMs are effective classifiers for automating the task of behavior difference classification, which can lead to speeding up release workflows, and improved OCE productivity.

Tags: "Process", "AI for SE", "Testing and Quality"

Pat Rondon, Renyao Wei, Jose Cambronero, Jürgen Cito, Aaron Sun, Siddhant Sanyam, Michele Tufano, Satish Chandra, "Evaluating Agent-based Program Repair at Google"

Abstract: Agent-based program repair offers the promise of automatically resolving complex bugs end-to-end by combining the planning, tool usage, and code-generating abilities of modern LLMs. Recent work has explored the use of agent-based repair approaches on the popular open-source SWE-Bench, a collection of bugs (and patches) from popular GitHub-hosted Python projects. In addition, various agentic approaches such as SWE-Agent have been proposed to solve bugs in this benchmark. This paper explores the opportunity of using a similar agentic approach to address bugs in an enterprise-scale context. We perform a systematic comparison of bugs in SWE-Bench and those found in Google's issue tracking system and show that they have different distributions in terms of language diversity, size and spread of changes, and ease of localization. Next, we implement Passerine, an agent similar in spirit to SWE-Agent that can work within Google's environment and produce patches for bugs in Google's code repository. To evaluate Passerine, we curate an evaluation set of 182 bugs, spanning human-reported (82) and machine-reported bugs (100) from Google's issue tracking system. We show that with 20 trajectory samples Passerine can produce a plausible patch for 70% of machine-reported and 14.6% of human-reported bugs in our evaluation set. After manual examination, we found that 42% of machine-reported bugs and 13.4% of human-reported bugs have at least one patch that is semantically equivalent to the ground-truth patch. This establishes a lower bound on performance that suggests agent-based APR holds promise for large-scale enterprise bug repair.

Tags: "Analysis/Repair", "AI for SE", "Testing and Quality"

Henri Aïdasso, Francis Bordeleau, Ali Tizghadam, "On the Diagnosis of Flaky Job Failures: Understanding and Prioritizing Failure Categories"

Abstract: The continuous delivery of modern software requires the execution of many automated pipeline jobs. These jobs ensure the frequent release of new software versions while detecting code problems at an early stage. For TELUS, our industrial partner in the telecommunications field, reliable job execution is crucial to minimize wasted time and streamline Continuous Deployment (CD). In this context, flaky job failures are one of the main issues hindering CD. Prior studies proposed techniques based on machine learning to automate the detection of flaky jobs. While valuable, these solutions are insufficient to address the waste associated with the diagnosis of flaky failures, which remain largely unexplored due to the wide range of underlying causes. This study examines 4,511 flaky job failures at TELUS to identify the different categories of flaky failures that we prioritize based on Recency, Frequency, and Monetary (RFM) measures. We identified 46 flaky failure categories that we analyzed using clustering and RFM measures to determine 14 priority categories for future automated diagnosis and repair research. Our findings also provide valuable insights into the evolution and impact of these categories. The identification and prioritization of flaky failure categories using RFM analysis introduce a novel approach that can be used in other contexts.

Tags: "Analysis", "Process", "Testing and Quality"

Sukrit Kumar, Drishti Goel, Thomas Zimmermann, Brian Houck, B. Ashok, Chetan Bansal, "Time Warp: The Gap Between Developers’ Ideal vs Actual Workweeks in an AI-Driven Era"

Abstract: Software developers balance a variety of different tasks in a workweek, yet the allocation of time often differs from what they consider ideal. Identifying and addressing these deviations is crucial for organizations aiming to enhance the productivity and well-being of the developers. In this paper, we present the findings from a survey of 484 software developers at Microsoft, which aims to identify the key differences between how developers would like to allocate their time during an ideal workweek versus their actual workweek. Our analysis reveals significant deviations between a developer's ideal workweek and their actual workweek, with a clear correlation: as the gap between these two workweeks widens, we observe a decline in both productivity and satisfaction. By examining these deviations in specific activities, we assess their direct impact on the developers' satisfaction and productivity. Additionally, given the growing adoption of AI tools in software engineering, both in the industry and academia, we identify specific tasks and areas that could be strong candidates for automation. In this paper, we make three key contributions: 1) We quantify the impact of workweek deviations on developer productivity and satisfaction 2) We identify individual tasks that disproportionately affect satisfaction and productivity. 3) We provide actual data-driven insights to guide future AI automation efforts in software engineering, aligning them with the developers' requirements and ideal workflows for maximizing their productivity and satisfaction.

Tags: "Human/Social", "Process"

Stoyan Nikolov, Daniele Codecasa, Anna Sjovall, Maxim Tabachnyk, Siddharth Taneja, Celal Ziftci, Satish Chandra, "How is Google using AI for internal code migrations?"

Abstract: In recent years, there has been a tremendous interest in using generative AI, and particularly large language models (LLMs) in software engineering; indeed several companies now offer commercially available tools, and many large companies also have created their own ML-based tools for their software engineers. While the use of ML for common tasks such as code completion is available in commodity tools, there is a growing interest in application of LLMs for more bespoke purposes. One such purpose is code migration. This article is an experience report on using LLMs for code migrations at Google. It is not a research study, in the sense that we do not carry out comparisons against other approaches or evaluate research questions/hypotheses. Rather, we share our experiences in applying LLM-based code migration in an enterprise context across a range of migration cases, in the hope that other industry practitioners will find our insights useful. Many of these learnings apply to any bespoke application of ML in software engineering. We see evidence that the use of LLMs can reduce the time needed for migrations significantly, and can reduce barriers to get started and complete migration programs.

Tags: "AI for SE", "Prog Comprehension/Reeng/Maint"

Bobby R. Bruce, Aidan Dakhama, Karine Even-Mendoza, W.B Langdon, Héctor D. Menéndez, Justyna Petke, "Search+LLM-based Testing for ARM Simulators"

Abstract: In order to aid quality assurance of large complex hardware architectures, system simulators have been developed. However, such system simulators do not always accurately mirror what would have happened on a real device. A significant challenge in testing these simulators comes from the complexity of having to model both the simulation and the infinite number of software that could be run on such a device. Our previous work introduced SearchSYS, a testing framework for software simulators. SearchSYS leverages a large language model for initial seed C code generation which is then compiled, and the resultant binary is fed to a fuzzer. We then use differential testing by running the outputs of fuzzing on real hardware and a system simulator to identify mismatches. In this paper, we present and discuss our solution to the problem of testing software simulators, using SearchSYS to test the gem5 VLSI digital circuit simulator, employed by ARM to test their systems. In particular, we focus on the simulation of the ARM silicon chip Instruction Set Architecture (ISA). SearchSYS can create test cases that activate bugs by combining LLMs, fuzzing, and differential testing. Using only LLM, SearchSYS identified 74 test cases that activated bugs. By incorporating fuzzing, this number increased by 93 additional bug-activating cases within 24 hours. Through differential testing, we identified 624 bugs with LLM-generated test cases and 126 with fuzzed test inputs. Out of the total number of bug-activating test cases, 4 unique bugs have been reported and acknowledged by developers. Additionally, we provided developers with a test case suite and fuzzing statistics, and open-sourced SearchSYS.

Tags: "Testing and Quality", "AI for SE"

Wei Liu, Feng Lin, Linqiang Guo, Tse-Hsun (Peter) Chen, Ahmed E. Hassan, "GUIWatcher: Automatically Detecting GUI Lags by Analyzing Mobile Application Screencasts"

Abstract: The Graphical User Interface (GUI) plays a central role in mobile applications, directly affecting usability and user satisfaction. Poor GUI performance, such as lag or unresponsiveness, can lead to negative user experience and decreased mobile application (app) ratings. In this paper, we present GUIWatcher, a framework designed to detect GUI lags by analyzing screencasts recorded during mobile app testing. GUIWatcher uses computer vision techniques to identify three types of lag-inducing frames (i.e., janky frames, long loading frames, and frozen frames) and prioritizes the most severe ones that significantly impact user experience. Our approach was evaluated using real-world mobile application tests, achieving high results in detecting GUI lags in screencasts, with an average precision of 0.91 and recall of 0.96. The comprehensive bug reports generated from the lags detected by GUIWatcher help developers focus on the more critical issues and debug them efficiently. Additionally, GUIWatcher has been deployed in a real-world production environment, continuously monitoring app performance and successfully identifying critical GUI performance issues. By offering a practical solution for identifying and addressing GUI lags, GUIWatcher contributes to enhancing user satisfaction and the overall quality of mobile apps.

Tags: "User experience", "AI for SE"

Nadia Nahar, Christian Kaestner, Jenna L. Butler, Chris Parnin, Thomas Zimmermann, Christian Bird, "Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products"

Abstract: Large Language Models (LLMs) are increasingly embedded into software products across diverse industries, enhancing user experiences, but at the same time introducing numerous challenges for developers. Unique characteristics of LLMs force developers, who are accustomed to traditional software development and evaluation, out of their comfort zones as the LLM components shatter standard assumptions about software systems. This study explores the emerging solutions that software developers are adopting to navigate the encountered challenges. Leveraging a mixed-method research, including 26 interviews and a survey with 332 responses, the study identifies 19 emerging solutions regarding quality assurance that practitioners across several product teams at Microsoft are exploring. The findings provide valuable insights that can guide the development and evaluation of LLM-based products more broadly in the face of these challenges.

Tags: "Human/Social", "AI for SE", "Testing and Quality", "User experience"

Kaiyuan Wang, Yang Li, Junyang Shen, Kaikai Sheng, Yiwei You, Jiaqi Zhang, Srikar Ayyalasomayajula, Julian Grady, Martin Wicke, "Automating ML Model Development at Scale"

Abstract: Google has a large team of machine learning (ML) developers working on a large number of ML models. ML model development suffers from long edit/validate cycles compared to traditional software development. This makes it tedious and time consuming for modeling teams to propagate ML innovations to their models. We present HEINZELMAENNCHEN, an ML modeling automation system, which allows users to apply semantically specified modeling changes to models and evaluate them at scale. Three insights are key to creating this system: Automatic code modification allows us to mechanically apply modeling changes to a wide variety of models. Workflow automation systems are well suited to operate complex ML training machinery as if they were humans, saving significant manual effort. And finally, given a large enough model population, even imperfect automatic modeling with a lower-than-human success rate will generate significant aggregate gains. In this paper, We describe the design and a implementation of our system. We also evaluate the system performance and include an empirical study to demonstrate the utility and critical impact of the system. Our system is widely used by hundreds of ML developers and it significantly accelerates model development on hundreds of production models.

Tags: "SE for AI", "Design/Architecture", "Analysis"

Nimmi Weeraddana, Sarra Habchi, Shane McIntosh, Ike Obi, Jenna L. Butler, Sankeerti Haniyur, Brian Hassan, Margaret-Anne Storey, Brendan Murphy, "Identifying Factors Contributing to ``Bad Days'' for Software Developers: A Mixed-Methods Study"

Abstract: Software development is a dynamic activity that requires engineers to work effectively with tools, processes, and collaborative teams. As a result, the presence of friction can significantly hinder productivity, increase frustration, and contribute to low morale among developers. By contrast, higher satisfaction levels are positively correlated with higher levels of perceived productivity. Hence, understanding the factors that cause bad experiences for developers is critical for fostering a positive and productive engineering environment. In this research, we employed a mixed-method approach, including interviews, surveys, diary studies, and analysis of developer telemetry data to uncover and triangulate common factors that cause ``bad days'' for developers. The interviews involved 22 developers across different levels and roles. The survey captured the perception of 214 developers about factors that cause them to have ``bad days'', their frequency, and their impact on job satisfaction. The daily diary study engaged 79 developers for 30 days to document factors that caused ``bad days'' in the moment. We examined the telemetry signals of 131 consenting participants to validate the impact of bad developer experience using system data. Findings from our research revealed factors that cause ``bad days'' for developers and significantly impact their work and well-being. We discuss the implications of these findings and suggest future work.

Tags: "Human/Social", "Process"

Kirill Vasilevski, Dayi Lin, Ahmed E. Hassan, "Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation Models"

Abstract: To balance the quality and inference cost of a Foundation Model (FM, such as large language models (LLMs)) powered software, people often opt to train a routing model that routes requests to FMs with different sizes and capabilities. Existing routing models rely on learning the optimal routing decision from carefully curated data, require complex computations to be updated, and do not consider the potential evolution of weaker FMs. In this paper, we propose Real-time Adaptive Routing (RAR), an approach to continuously adapt FM routing decisions while using guided in-context learning to enhance the capabilities of weaker FM. The goal is to reduce reliance on stronger, more expensive FMs. We evaluate our approach on different subsets of the popular MMLU benchmark. Our approach routes 50.2\% fewer requests to computationally expensive models while maintaining around 90.5\% of the general response quality. In addition, the generated guidance from stronger models has shown intra-domain generalization and led to a better quality of responses compared to an equivalent approach with a standalone weaker FM.

Tags: "SE for AI", "Analysis"

Jenna L. Butler, Jina Suh, Sankeerti Haniyur, Constance Hadley, "Dear Diary: A randomized controlled trial of Generative AI coding tools in the workplace"

Abstract: Generative AI coding tools are relatively new, and their impact on developers extends beyond traditional coding metrics, influencing beliefs about work and developers’ roles in the workplace. This study aims to illuminate developers' pre-existing beliefs about generative AI tools, their self-perceptions, and how regular use of these tools may alter these beliefs. Using a mixed-methods approach, including surveys, a randomized controlled trial, and a three-week diary study, we explored the real-world application of generative AI tools within a large multinational software company. Our findings reveal that the introduction and sustained use of generative AI coding tools significantly increases developers' perceptions of these tools as both useful and enjoyable. However, developers' views on the trustworthiness of AI-generated code remained unchanged. We also discovered unexpected uses of these tools, such as replacing web searches and fostering creative ideation. Additionally, 84\% of participants reported positive changes in their daily work practices, and 66\% noted shifts in their feelings about their work, ranging from increased enthusiasm to heightened awareness of the need to stay current with technological advances. This research provides both qualitative and quantitative insights into the evolving role of generative AI in software development and offers practical recommendations for maximizing the benefits of this emerging technology—particularly in balancing the productivity gains from AI-generated code with the need for increased scrutiny and critical evaluation of its outputs.

Tags: "Human/Social", "AI for SE"

Wannita Takerngsaksiri, Jirat Pasuksmit, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Ruixiong Zhang, Fan Jiang, Jing Li, Evan Cook, Kun Chen, Ming Wu, "Human-In-the-Loop Software Development Agents"

Abstract: Recently, Large Language Models (LLMs)-based multi-agent paradigms for software engineering are introduced to automatically resolve software development tasks (e.g., from a given issue to source code). However, existing work is evaluated based on historical benchmark datasets, does not consider human feedback at each stage of the automated software development process, and has not been deployed in practice. In this paper, we introduce a Human-in-the-loop LLM-based Agents framework (HULA) for software development that allows software engineers to refine and guide LLMs when generating coding plans and source code for a given task. We design, implement, and deploy the HULA framework into Atlassian JIRA for internal uses. Through a multi-stage evaluation of the HULA framework, Atlassian software engineers perceive that HULA can minimize the overall development time and effort, especially in initiating a coding plan and writing code for straightforward tasks. On the other hand, challenges around code quality are raised to be solved in some cases. We draw lessons learned and discuss opportunities for future work, which will pave the way for the advancement of LLM-based agents in software development.

Tags: "AI for SE", "Human/Social", "Process"

Hao Li, Cor-Paul Bezemer, Ahmed E. Hassan, "Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models"

Abstract: Foundation models (FMs) such as large language models (LLMs) have significantly impacted many fields, including software engineering (SE). The interaction between SE and FMs has led to the integration of FMs into SE practices (FM4SE) and the application of SE methodologies to FMs (SE4FM). While several literature surveys exist on academic contributions to these trends, we are the first to provide a practitioner's view. We analyze 155 FM4SE and 997 SE4FM blog posts from leading technology companies, leveraging an FM-powered surveying approach to systematically label and summarize the discussed activities and tasks. We observed that while code generation is the most prominent FM4SE task, FMs are leveraged for many other SE activities such as code understanding, summarization, and API recommendation. The majority of blog posts on SE4FM are about model deployment \& operation, and system architecture \& orchestration. Although the emphasis is on cloud deployments, there is a growing interest in compressing FMs and deploying them on smaller devices such as edge or mobile devices. We outline eight future research directions inspired by our gained insights, aiming to bridge the gap between academic findings and real-world applications. Our study not only enriches the body of knowledge on practical applications of FM4SE and SE4FM but also demonstrates the utility of FMs as a powerful and efficient approach in conducting literature surveys within technical and grey literature domains. Our dataset, results, code and used prompts can be found in our online replication package at https://github.com/SAILResearch/fmse-blogs.

Tags: "AI for SE", "Human/Social", "SE for AI"

Sanna Malinen, Matthias Galster, Antonija Mitrovic, Sreedevi Sankara Iyer, Pasan Peiris, April Clarke, "Soft Skills in Software Engineering: Insights from the Trenches"

Abstract: Context: Soft skills (e.g., communication skills, empathy, team skills) impact software project success. Objective: We aim to understand (1) how soft skills are defined and perceived in the software industry, (2) what soft skills are required in software development roles, (3) how soft skills are trained, and (4) how soft skills are assessed in the software industry. Method: We conducted 18 semi-structured interviews with software professionals in different roles. We manually analyzed transcripts following a general inductive approach. Results: There is ambiguity in soft skills definition, but agreement on their importance. Most critical soft skills were communication, leadership, and teamwork skills, but we also identified less frequently discussed skills: resilience and self-awareness. Further, we find that soft skills are not systematically assessed, likely due to difficulties in their assessment. A theme emerged on the importance of ongoing soft skills training as well as tailoring training to software professionals' needs. Conclusions: Our research supports past research on the importance of soft skills in software engineering and suggests that further emphasis is needed on soft skills assessment and training. We discuss implications for software professionals, those in leadership roles, and companies.

Tags: "Human/Social"

Sidong Feng, Changhao Du, Huaxiao Liu, Qingnan Wang, Zhengwei Lv, Gang Huo, Xu Yang, Chunyang Chen, "Agent for User: Testing Multi-User Interactive Features in TikTok"

Abstract: TikTok, a widely-used social media app boasting over a billion monthly active users, requires effective app quality assurance for its intricate features. Feature testing is crucial in achieving this goal. However, the multi-user interactive features within the app, such as voice calls, live streaming, and video conferencing, pose significant challenges for developers, who must handle simultaneous device management and user interaction coordination. Inspired by the concept of agents designed to autonomously and collaboratively tackle problems, we introduce a novel multi-agent approach, powered by the Large Language Models (LLMs), to automate the testing of multi-user interactive app features. In detail, we build a virtual device farm that allocates the necessary number of devices for a given multi-user interactive task. For each device, we deploy an LLM-based agent that simulates a user, thereby mimicking user interactions to collaboratively automate the testing process. The evaluations on 24 multi-user interactive tasks within the TikTok app, showcase its capability to cover 75% of tasks with 85.9% action similarity and offer 87% time savings for developers. Additionally, we have also integrated our approach into the real-world TikTok testing platform, aiding in the detection of 26 multi-user interactive bugs.

Tags: "Testing and Quality", "AI for SE", "User experience"

Tao Duan, Runqing Chen, Pinghui Wang, Junzhou Zhao, Jiongzhou Liu, Shujie Han, Yi Liu, Fan Xu, "BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems"

Abstract: Cloud infrastructure is the collective term for all physical devices within cloud systems. Failures within the cloud infrastructure system can severely compromise the stability and availability of cloud services. Particularly, batch servers outage, which is the most fatal failure, could result in the complete unavailability of all upstream services. In this work, we focus on the batch servers outage diagnosis problem, aiming to accurately and promptly analyze the root cause of outages to facilitate troubleshooting. However, our empirical study conducted in a real industrial system indicates that it is a challenging task. Firstly, the collected single-modal coarse-grained failure monitoring data (i.e., alert, incident, or change) in the cloud infrastructure system is insufficient for a comprehensive failure profiling. Secondly, due to the intricate dependencies among devices, outages are often the cumulative result of multiple failures, but correlations between failures are difficult to ascertain. To address these problems, we propose BSODiag, an unsupervised and lightweight framework designed for diagnosing batch servers outage. BSODiag provides a global analytical perspective, thoroughly explores failures information from multi-source monitoring data, models the spatio-temporal correlations among failures, and delivers accurate and interpretable diagnostic results. Experiments conducted on the Alibaba Cloud infrastructure system show that BSODiag achieves 87.5% PR@3 and 46.3% PCR, outperforming baseline methods by 10.2% and 3.7%, respectively.

Tags: "Analysis", "Real-Time", "Testing and Quality"

Martin Obaidi, Nicolas Voß, Hannah Deters, Jakob Droste, Marc Herrmann, Jannik Fischbach, Kurt Schneider, "Automating Explanation Need Management in App Reviews: A Case Study from the Navigation App Industry"

Abstract: Providing explanations in response to user reviews is a time-consuming and repetitive task for companies, as many reviews present similar issues requiring nearly identical responses. To improve efficiency, this paper proposes a semi-automated approach to managing explanation needs in user reviews. The approach leverages taxonomy categories to classify reviews and assign them to relevant internal teams or sources for responses. 2,366 app reviews from the Google Play Store and Apple App Store were scraped and analyzed using a word and phrase filtering system to detect explanation needs. The detected needs were categorized and assigned to specific internal teams at Graphmasters GmbH, using a hierarchical assignment strategy that prioritizes the most relevant teams. Additionally, external sources, such as existing support articles and past review responses, were integrated to provide comprehensive explanations. The system was evaluated through interviews and surveys with the Graphmasters support team. The results showed that the hierarchical assignment method improved the accuracy of team assignments, with correct teams being identified in 79.2% of cases. However, challenges in interrater agreement and the need for new responses in certain cases, particularly for Apple App Store reviews, were noted. Future work will focus on refining the taxonomy and enhancing the automation process to reduce manual intervention further.

Tags: "Testing and Quality", "Human/Social"

Arihant Bedagkar, Sayandeep Mitra, Raveendra Kumar M, Ravindra Naik, Samiran Pal, "LLM Driven Smart Assistant for Data Mapping"

Abstract: Data mapping is a crucial step during application migration and for application integration. Data (model) mapping comprises "schema matching" to identify semantically equivalent fields between two schema, and "transformation logic generation" to write rules for converting data from one schema to the other. In industry practice today, data mapping is largely manual in nature, done by domain experts. We present a data mapping assistant powered by Large Language Models (LLMs), providing disruptive precision improvement over SOTA methods and multiple automation workflows that let users provide different available input triggers (context) for inferring the mappings. We illustrate the contribution using various representative industrial datasets.

Tags: "Analysis", "Databases", "AI for SE"

Software Engineering in Practice (SEIP)ICSE 2025

Call for papers

Submissions

Submission Process

Evaluation

Important Dates

Conference Attendance Expectation

Contact

Program Display Configuration

Wed 30 AprDisplayed time zone: Eastern Time (US & Canada) change

Thu 1 MayDisplayed time zone: Eastern Time (US & Canada) change

Fri 2 MayDisplayed time zone: Eastern Time (US & Canada) change

Accepted Papers

Accepted Papers

John Kolesar, Tancrède Lepoint, Martin Schäf, Willem Visser, "Safe Validation of Pricing Agreements"

Qiang Zhang, Yuheng Shen, Jianzhong Liu, Yiru Xu, Heyuan Shi, Yu Jiang, Wanli Chang, "Sunflower: Enhancing Linux Kernel Fuzzing via Exploit-Driven Seed Generation"

Ridwan Shariffdeen, Behnaz Hassanshahi, Martin Mirchev, Ali El Husseini, Abhik Roychoudhury, "Detecting Python Malware in the Software Supply Chain with Program Analysis"

Dhruva Juloori, Zhongpeng Lin, Matthew Williams, Eddy Shin, Sonal Mahajan, "CI at Scale: Lean, Green, and Fast"

Shengnan Wu, Yongxiang Hu, Jiazhen Gu, Penglei Mao, Jin Meng, Liujie Fan, Zhongshi Luan, Xin Wang, Yangfan Zhou, "Testing False Recalls in E-commerce Apps: a User-perspective Blackbox Approach"

daeha ryu, seokjun ko, eunbi jang, jinyoung park, myunggwan kim, changseo park, "SEMANTIC CODE FINDER: An Efficient Semantic Search Framework for Large-Scale Codebases"

Yongxiang Hu, Yu Zhang, Xuan Wang, Yingjie Liu, Shiyu Guo, Chaoyi Chen, Xin Wang, Yangfan Zhou, "KuiTest: Leveraging Knowledge in the Wild as GUI Testing Oracle for Mobile Apps"

Umut Cihan, Vahid Haratian, Arda İçöz, Mert Kaan Gül, Ömercan Devran, Emircan Furkan Bayendur, Baykal Mehmet Uçar, Eray Tuzun, "Automated Code Review In Practice"

Matteo Esposito, Francesco Palagiano, Valentina Lenarduzzi, Davide Taibi, "On Large Language Models in Mission-Critical IT Governance: Are We Ready Yet?"

Gokul Rejithkumar, Preethu Rose Anish, "NICE: Non-Functional Requirements Identification, Classification, and Explanation Using Small Language Models"

Jinquan Zhang, Zihao Wang, Pei Wang, Rui Zhong, Dinghao Wu, "FlatD: Protecting Deep Neural Network Program from Reversing Attacks"

Kristóf Umann, Zoltán Porkoláb, "Towards Better Static Analysis Bug Reports in the Clang Static Analyzer"

Chi Xiao, Daniel Ståhl, Jan Bosch, "UML Sequence Diagram Generation: A Multi-Model, Multi-Domain Evaluation"

Claudio Martens, Hammam Abdelwahab, Katharina Beckh, Birgit Kirsch, Vishwani Gupta, Dennis Wegener, Steffen Hoh, "Leveraging MLOps: Developing a Sequential Classification System for RFQ Documents in Electrical Engineering"

Rangeet Pan, Myeongsoo Kim, Rahul Krishna, Raju Pavuluri, Saurabh Sinha, "ASTER: Natural and Multi-language Unit Test Generation with LLMs"

Chandra Maddila, Negar Ghorbani, Kosay Jabre, Vijayaraghavan Murali, Edwin Kim, Parth Thakkar, Nikolay Pavlovich Laptev, Olivia Harman, Diana Hsu, Rui Abreu, Peter Rigby, "AI-Assisted SQL Authoring at Industry Scale"

Rui Abreu, Vijayaraghavan Murali, Peter Rigby, Chandra Maddila, Weiyan Sun, Jun Ge, Kaavya Chinniah, Audris Mockus, Megh Mehta, Nachiappan Nagappan, "Moving Faster and Reducing Risk: Using LLMs in Release Deployment"

James Johnson, Julian Thome, Lucas Charles, Hua Yan, Jason Leasure, "A scalable, effective and simple Vulnerability Tracking approach for heterogeneous SAST setups based on Scope+Offset"

Soumiki Chattopadhyay, Amreeta Chatterjee, Puja Agarwal, Bianca Trinkenreich, Swarathmika Kumar, Rohit Ranjan Rai, Resham Chugani, Pragya Kumari, Margaret Burnett, Anita Sarma, "Systematizing Inclusive Design in MOSIP: An Experience Report"

Ita Ryan, Utz Roedig, Klaas-Jan Stol, "'ImmediateShortTerm3MthsAfterThatLOL'': Developer Secure-Coding Sentiment, Practice and Culture in Organisations"

Yilun Liu, Yuhe Ji, Shimin Tao, Minggui He, Weibin Meng, Shenglin Zhang, Yongqian Sun, Yuming Xie, Boxing Chen, Hao Yang, "LogLM: From Task-based to Instruction-based Automated Log Analysis"

Niklas Beck, Benny Stein, Dennis Wegener, Lennard Helmer, "Evaluation of Tools and Frameworks for Machine Learning Model Serving"

Mar Zamorano, Daniel Blasco, Carlos Cetina, Federica Sarro, "Video Game Procedural Content Generation Through Software Transplantation"

Lekshmi Murali Rani, Faezeh Mohammadi, Robert Feldt, Richard Berntsson Svensson, "An Empirical Study on Decision-Making Aspects in Responsible Software Engineering for AI"

Elise Paradis, Kate Grey, Quinn Madison, Daye Nam, Andrew Macvean, Nan Zhang, Ben Ferrari-Church, Satish Chandra, "How much does AI impact development speed? An enterprise-based randomized controlled trial"

Behrooz Omidvar Tehrani, Ingo Gühring, Gauthier Guinet, Linbo Liu, Hoan Nguyen, Talha Oz, Qiang Zhou, Laurent Callot, Jun Huan, Omer Tripp, Anoop Deoras, "Migrating Java Applications with Amazon Q Developer Agent for Code Transformation"

Haonan Chen, Daihang Chen, Yizhuo Yang, Lingyun Xu, Liang Gao, Mingyi Zhou, Chunming Hu, Li Li, "ArkAnalyzer: The Static Analysis Framework for OpenHarmony"

Haoran Xu, Chen Zhi, Tianyu Xiang, Zixuan Wu, Gaorong Zhang, Xinkui Zhao, Jianwei Yin, Shuiguang Deng, "Prioritizing Large-scale Natural Language Test Cases at OPPO"

Mohammad Saiful Islam, Mohamed Sami Rakha, William Pourmajidi, Janakan Sivaloganathan, John Steinbacher, Andriy Miranskyy, "Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset"

Tri Minh-Triet Pham, Karthikeyan Premkumar, Mohamed Naili, Jinqiu Yang, "Time to Retrain? Detecting Concept Drifts in Machine Learning Systems"

Delaram Ghobari, Mohammad Hossein Amini, Dai Quoc Tran, Seunghee Park, Shiva Nejati, Mehrdad Sabetzadeh, "Test Input Validation for Vision-based DL Systems: An Active Learning Approach"

Nihal Jain, Robert Kwiatkowski, Baishakhi Ray, Murali Krishna Ramanathan, Varun Kumar, "On Mitigating Code LLM Hallucinations with API Documentation"

Xiaoyun Liang, Jingyi Ren, Jiayi Qi, Chao Peng, Bo Jiang, Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Ge Li, "aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing"

Petr Tsvetkov, Aleksandra Eliseeva, Danny Dig, Alexander Bezzubov, Yaroslav Golubev, Timofey Bryksin, Yaroslav Zharov, "Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings"

Spandan Garg, Roshanak Moghaddam, Neel Sundaresan, "Improving Code Performance Using LLMs in Zero-Shot: RAPGen"

Guilherme Vaz Pereira, Victoria Jackson, Rafael Prikladnicki, André van der Hoek, Luciane Fortes, Carolina Araújo, André Coelho, Ligia Chelli, Diego Ramos, "Exploring GenAI in Software Development: Insights from a Case Study in a Large Brazilian Company"

Charlotte Brandebusemeyer, Tobias Schimmer, Bert Arnrich, "Wearables to measure developer experience at work"

Pat Rondon, Renyao Wei, Jose Cambronero, Jürgen Cito, Aaron Sun, Siddhant Sanyam, Michele Tufano, Satish Chandra, "Evaluating Agent-based Program Repair at Google"

Henri Aïdasso, Francis Bordeleau, Ali Tizghadam, "On the Diagnosis of Flaky Job Failures: Understanding and Prioritizing Failure Categories"

Sukrit Kumar, Drishti Goel, Thomas Zimmermann, Brian Houck, B. Ashok, Chetan Bansal, "Time Warp: The Gap Between Developers’ Ideal vs Actual Workweeks in an AI-Driven Era"

Stoyan Nikolov, Daniele Codecasa, Anna Sjovall, Maxim Tabachnyk, Siddharth Taneja, Celal Ziftci, Satish Chandra, "How is Google using AI for internal code migrations?"

Bobby R. Bruce, Aidan Dakhama, Karine Even-Mendoza, W.B Langdon, Héctor D. Menéndez, Justyna Petke, "Search+LLM-based Testing for ARM Simulators"

Wei Liu, Feng Lin, Linqiang Guo, Tse-Hsun (Peter) Chen, Ahmed E. Hassan, "GUIWatcher: Automatically Detecting GUI Lags by Analyzing Mobile Application Screencasts"

Nadia Nahar, Christian Kaestner, Jenna L. Butler, Chris Parnin, Thomas Zimmermann, Christian Bird, "Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products"

Kaiyuan Wang, Yang Li, Junyang Shen, Kaikai Sheng, Yiwei You, Jiaqi Zhang, Srikar Ayyalasomayajula, Julian Grady, Martin Wicke, "Automating ML Model Development at Scale"

Nimmi Weeraddana, Sarra Habchi, Shane McIntosh, Ike Obi, Jenna L. Butler, Sankeerti Haniyur, Brian Hassan, Margaret-Anne Storey, Brendan Murphy, "Identifying Factors Contributing to ``Bad Days'' for Software Developers: A Mixed-Methods Study"

Kirill Vasilevski, Dayi Lin, Ahmed E. Hassan, "Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation Models"

Jenna L. Butler, Jina Suh, Sankeerti Haniyur, Constance Hadley, "Dear Diary: A randomized controlled trial of Generative AI coding tools in the workplace"

Wannita Takerngsaksiri, Jirat Pasuksmit, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Ruixiong Zhang, Fan Jiang, Jing Li, Evan Cook, Kun Chen, Ming Wu, "Human-In-the-Loop Software Development Agents"

Hao Li, Cor-Paul Bezemer, Ahmed E. Hassan, "Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models"

Sanna Malinen, Matthias Galster, Antonija Mitrovic, Sreedevi Sankara Iyer, Pasan Peiris, April Clarke, "Soft Skills in Software Engineering: Insights from the Trenches"

Sidong Feng, Changhao Du, Huaxiao Liu, Qingnan Wang, Zhengwei Lv, Gang Huo, Xu Yang, Chunyang Chen, "Agent for User: Testing Multi-User Interactive Features in TikTok"

Tao Duan, Runqing Chen, Pinghui Wang, Junzhou Zhao, Jiongzhou Liu, Shujie Han, Yi Liu, Fan Xu, "BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems"

Martin Obaidi, Nicolas Voß, Hannah Deters, Jakob Droste, Marc Herrmann, Jannik Fischbach, Kurt Schneider, "Automating Explanation Need Management in App Reviews: A Case Study from the Navigation App Industry"

Arihant Bedagkar, Sayandeep Mitra, Raveendra Kumar M, Ravindra Naik, Samiran Pal, "LLM Driven Smart Assistant for Data Mapping"

Ciera JaspanSEIP Program Co-Chair

Google

United States

Rick KazmanSEIP Program Co-Chair

University of Hawai‘i at Mānoa

Abi Noda

DX, USA

Adriana Meza Soria

MIT-IBM Watson AI Lab

Ahmed E. Hassan

Queen’s University

Wed 30 Apr
Displayed time zone: Eastern Time (US & Canada) change

Thu 1 May
Displayed time zone: Eastern Time (US & Canada) change

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change