ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Accepted Papers

Title
A Controlled Experiment in Age and Gender Bias When Reading Technical Articles in Software Engineering
Journal-first Papers
AddressWatcher: Sanitizer-Based Localization of Memory Leak Fixes
Journal-first Papers
Adopting Automated Bug Assignment in Practice - A Longitudinal Case Study at Ericsson
Journal-first Papers
A Large-Scale Exploratory Study on the Proxy Pattern in EthereumBlockchain
Journal-first Papers
An Empirical Study of Challenges in Machine Learning Asset ManagementSE for AI
Journal-first Papers
An Empirical Study on Developers' Shared Conversations with ChatGPT in GitHub Pull Requests and Issues
Journal-first Papers
Asking and Answering Questions During Memory Profiling
Journal-first Papers
Assessing Evaluation Metrics for Neural Test Oracle Generation
Journal-first Papers
A Tale of Two Comprehensions? Analyzing Student Programmer Attention During Code Summarization
Journal-first Papers
Automated Testing Linguistic Capabilities of NLP Models
Journal-first Papers
Automatic Commit Message Generation: A Critical Review and Directions for Future Work
Journal-first Papers
Automatic Identification of Game Stuttering via Gameplay Videos Analysis
Journal-first Papers
BatFix: Repairing language model-based transpilation
Journal-first Papers
Best ends by the best means: ethical concerns in app reviews
Journal-first Papers
Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning ProjectsSE for AI
Journal-first Papers
Link to publication DOI Pre-print
Boundary State Generation for Testing and Improvement of Autonomous Driving SystemsSE for AI
Journal-first Papers
DOI Pre-print
Bridging the Language Gap: An Empirical Study of Bindings for Open Source Machine Learning Libraries Across Software Package Ecosystems
Journal-first Papers
Link to publication DOI Pre-print
Bug Analysis in Jupyter Notebook Projects: An Empirical Study
Journal-first Papers
Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-PracticeSE for AI
Journal-first Papers
DOI Pre-print File Attached
Characterizing the Prevalence, Distribution, and Duration of Stale Reviewer Recommendations
Journal-first Papers
Characterizing Timeout Builds in Continuous Integration
Journal-first Papers
D3: Differential Testing of Distributed Deep Learning with Model GenerationSE for AI
Journal-first Papers
Diversity's Double-Edged Sword: Analyzing Race's Effect on Remote Pair Programming Interactions
Journal-first Papers
Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?Security
Journal-first Papers
Link to publication DOI Authorizer link Pre-print
Efficient Management of Containers for Software Defined Vehicles
Journal-first Papers
Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement
Journal-first Papers
EpiTESTER: Testing Autonomous Vehicles with Epigenetic Algorithm and Attention Mechanism
Journal-first Papers
Evaluating the Impact of Flaky Simulators on Testing Autonomous Driving SystemsSE for AI
Journal-first Papers
Exploring User Privacy Awareness on GitHub: An Empirical Study
Journal-first Papers
FairBalance: How to Achieve Equalized Odds With Data Pre-processing
Journal-first Papers
Follow-Up Attention: An Empirical Study of Developer and Neural Model Code Exploration
Journal-first Papers
GenMorph: Automatically Generating Metamorphic Relations via Genetic Programming
Journal-first Papers
Guess the State: Exploiting Determinism to Improve GUI Exploration Efficiency
Journal-first Papers
History-Driven Fuzzing for Deep Learning Libraries
Journal-first Papers
Hunting bugs: Towards an automated approach to identifying which change caused a bug through regression testing
Journal-first Papers
Identifying Performance Issues in Cloud Service Systems Based on Relational-Temporal Features
Journal-first Papers
Investigating the Online Recruitment and Selection Journey of Novice Software Engineers: Anti-patterns and Recommendations
Journal-first Papers
Link to publication Pre-print
LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation
Journal-first Papers
Link to publication
LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Journal-first Papers
MarMot: Metamorphic Runtime Monitoring of Autonomous Driving Systems
Journal-first Papers
Mimicking Production Behavior With Generated Mocks
Journal-first Papers
Mitigating Noise in Quantum Software Testing Using Machine LearningQuantum
Journal-first Papers
Mole: Efficient Crash Reproduction in Android Applications With Enforcing Necessary UI Events
Journal-first Papers
Non-Autoregressive Line-Level Code Completion
Journal-first Papers
On Effectiveness and Efficiency of Gamified Exploratory GUI Testing
Journal-first Papers
On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools.Security
Journal-first Papers
On the Understandability of MLOps System Architectures
Journal-first Papers
Link to publication DOI
Optimization of Automated and Manual Software Tests in Industrial Practice: A Survey and Historical Analysis
Journal-first Papers
Link to publication Pre-print
PACE: A Program Analysis Framework for Continuous Performance Prediction
Journal-first Papers
Precisely Extracting Complex Variable Values from Android AppsFormal Methods
Journal-first Papers
Predicting Attrition among Software Professionals: Antecedents and Consequences of Burnout and Engagement
Journal-first Papers
Predicting the First Response Latency of Maintainers and Contributors in Pull Requests
Journal-first Papers
Qualitative Surveys in Software Engineering Research: Definition, Critical Review, and GuidelinesResearch Methods
Journal-first Papers
Link to publication DOI
QuanTest: Entanglement-Guided Testing of Quantum Neural Network SystemsQuantum
Journal-first Papers
Link to publication
Quantum Approximate Optimization Algorithm for Test Case OptimizationQuantum
Journal-first Papers
Reducing the Length of Field-replay Based Load Testing
Journal-first Papers
Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension StudySE for AI
Journal-first Papers
DOI Pre-print
Relevant information in TDD experiment reporting
Journal-first Papers
Replication in Requirements Engineering: the NLP for RE Case
Journal-first Papers
Reputation Gaming in Crowd Technical Knowledge Sharing
Journal-first Papers
Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic DatasetsSecurity
Journal-first Papers
RLocator: Reinforcement Learning for Bug Localization
Journal-first Papers
Runtime Verification and Field-based Testing for ROS-based Robotic Systems
Journal-first Papers
Link to publication DOI
Shaken, Not Stirred. How Developers Like Their Amplified Tests
Journal-first Papers
Pre-print
SimClone: Detecting Tabular Data Clones using Value Similarity
Journal-first Papers
SparseCoder: Advancing Source Code Analysis with Sparse Attention and Learned Token Pruning
Journal-first Papers
Link to publication DOI
Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP
Journal-first Papers
Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality
Journal-first Papers
Link to publication DOI Pre-print
SUPERSONIC: Learning to Generate Source Code Optimizations in C/C++
Journal-first Papers
Test Case Minimization with Quantum AnnealingQuantum
Journal-first Papers
Test Generation Strategies for Building Failure Models and Explaining Spurious Failures
Journal-first Papers
Pre-print
Testing Multi-Subroutine Quantum Programs: From Unit Testing to Integration TestingQuantum
Journal-first Papers
The impact of Concept drift and Data leakage on Log Level Prediction Models
Journal-first Papers
Toward a Theory of Causation for Interpreting Neural Code Models
Journal-first Papers
Link to publication DOI Pre-print
Toward a Theory on Programmer's Block Inspired by Writer's Block
Journal-first Papers
Link to publication
Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding WeaknessesSecurity
Journal-first Papers
Towards a Cognitive Model of Dynamic Debugging: Does Identifier Construction Matter?
Journal-first Papers
Towards Effectively Testing Machine Translation Systems from White-Box Perspectives
Journal-first Papers
Tracking the Evolution of Static Code Warnings: The State-of-the-Art and a Better Approach
Journal-first Papers
T-Rec: Fine-Grained Language-Agnostic Program Reduction Guided by Lexical Syntax
Journal-first Papers
Two is Better Than One: Digital Siblings to Improve Autonomous Driving TestingSE for AI
Journal-first Papers
DOI Pre-print
Understanding Code Understandability Improvements in Code Reviews
Journal-first Papers
Understanding Real-time Collaborative Programming: a Study of Visual Studio Live Share
Journal-first Papers
Using Knowledge Units of Programming Languages to Recommend Reviewers for Pull Requests: An Empirical Study
Journal-first Papers
Vision Transformer Inspired Automated Vulnerability RepairSecurity
Journal-first Papers
VulNet: Towards improving vulnerability management in the Maven ecosystemSecurity
Journal-first Papers
ZigZagFuzz: Interleaved Fuzzing of Program Options and Files
Journal-first Papers

Call for Contributions

ICSE has formed partnerships with prestigious software engineering journals to incorporate journal-first papers into the ICSE program. Through this initiative, authors of journal-first papers accepted in the partnering journals will be invited to present their work at ICSE, thus providing an opportunity for the authors to engage directly with the community and offering the ICSE attendees an additional dimension to the research track program.

The journals that support the journal-first model as partners with ICSE are:

  • IEEE Transaction on Software Engineering (IEEE TSE),
  • ACM Transactions on Software Engineering and Methodology (ACM TOSEM),
  • Empirical Software Engineering (EMSE).

Scope

A submission to the ICSE 2025 call for journal-first paper presentations must adhere to the following criteria:

  • The associated journal paper needs to have been accepted to a journal from the above list no earlier than November 1st, 2023 and no later than October 10th, 2024.
  • The paper is in the scope of the conference.
  • The paper does not exclusively report a secondary study, e.g., systematic reviews, mapping studies, surveys.
  • The paper reports completely new research results and/or presents novel contributions that significantly extend and were not previously reported in prior work.
    • The paper does not extend prior work solely with additional proofs or algorithms (or other such details presented for completeness), additional empirical results, or minor enhancements or variants of the results presented in the prior work.
    • As a rough guide, a journal-first paper should have at least 70% new content over and above the content of previous publications. As such, the expectation is that an extension of a full 8-10 pages conference or workshop paper would not be deemed a journal-first paper.
  • The paper has not been presented at, and is not under consideration for, journal-first programs of other conferences.

How to Submit

The authors of any paper that meets the above criteria are invited to submit a (maximum) one-page presentation proposal consisting of the paper’s title, the paper’s authors, an extended abstract, and a pointer to the original journal paper at the journal’s Web site. If the journal paper is related to or builds on a previously published work (such as a tool demo or a poster), then the proposal must clearly and explicitly justify why the paper should be considered a journal first paper.

The template to use is the IEEE conference proceedings template, specified in the IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTeX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf options). It is important to note that these submissions will not be published. The specified format is so that all the submissions have a consistent look to facilitate the selection process.

By submitting your article to an IEEE Publication, you are hereby acknowledging that you and your co-authors are subject to all IEEE Publications Policies.

Submission site: https://icse2025-jf.hotcrp.com/

Submissions must not exceed 1 page.

Evaluation and Selection

Authors will be invited to present their paper at ICSE 2025 after a check that the paper satisfies the above listed criteria. As the papers have already been reviewed and accepted by the journals, they will not be reviewed again for technical content. In the case that an exceptionally high number of submissions is received, not all papers will be selected. Priority will be given to the papers that:

  • Increase opportunities for authors to attend ICSE, who might not otherwise attend. In particular, priority will be given to papers whose specified presenter is not presenting other journal first or main research track papers.
  • Best fit the technical program, offering a balance across the conference topics: preference will be given to topics that are under-represented in the other tracks.
  • Would be ineligible as a journal-first presentation at the next SE3 conference (ICSE/FSE/ASE) – because its acceptance date precedes the next conference’s window of journal acceptance dates for JF presentations.

If there is further need to select from papers with the same priority, then they will be randomly selected. However, we will do our best to avoid this situation.

Important Dates

  • Journal First Submissions Deadline: 21 October 2024
  • Journal First Acceptance Notification: 10 December, 2024
  • Submissions close at 23:59 AoE (Anywhere on Earth, UTC-12)

Conference Attendance Expectation

If a submission is accepted for the journal-first program, the specified presenter must register for and attend the full 3-day technical conference and present the paper. The presentation is expected to be delivered in person, unless this is impossible due to travel limitations (related to, e.g., health, visa, or COVID-19 prevention). Each journal-first presentation will be scheduled in a session with topically-related Technical Track, NIER, SEIP, and/or SEIS papers. The journal-first manuscripts are published through the journals and will not be part of the ICSE proceedings. The journal-first papers will be listed in the conference program.

Dates
Tracks
You're viewing the program in a time zone which is different from your device's time zone change time zone

Wed 30 Apr

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
Formal Methods 1Research Track / New Ideas and Emerging Results (NIER) at 103
Chair(s): Cristian Cadar Imperial College London
11:00
15m
Talk
SpecGen: Automated Generation of Formal Program Specifications via Large Language ModelsFormal Methods
Research Track
Lezhi Ma Nanjing University, Shangqing Liu Nanyang Technological University, Yi Li Nanyang Technological University, Xiaofei Xie Singapore Management University, Lei Bu Nanjing University
11:15
15m
Talk
Gpass: a Goal-adaptive Neural Theorem Prover based on Coq for Automated Formal VerificationFormal Methods
Research Track
Yizhou Chen Peking University, Zeyu Sun Institute of Software, Chinese Academy of Sciences, Guoqing Wang Peking University, Dan Hao Peking University
11:30
15m
Talk
AI-Assisted Autoformalization of Combinatorics Problems in Proof AssistantsFormal Methods
New Ideas and Emerging Results (NIER)
Long Doan George Mason University, ThanhVu Nguyen George Mason University
11:45
15m
Talk
Formally Verified Binary-level Pointer AnalysisFormal MethodsArtifact-Available
Research Track
Freek Verbeek Open Universiteit & Virginia Tech, Ali Shokri Virginia Tech, Daniel Engel Open University Of The Netherlands, Binoy Ravindran Virginia Tech
12:00
15m
Talk
EffBT: An Efficient Behavior Tree Reactive Synthesis and Execution FrameworkFormal MethodsArtifact-FunctionalArtifact-Available
Research Track
ziji wu National University of Defense Technology, yu huang National University of Defense Technology, peishan huang National University of Defense Technology, shanghua wen National University of Defense Technology, minglong li National University of Defense Technology, Ji Wang National University of Defense Technology
12:15
7m
Talk
SolSearch: An LLM-Driven Framework for Efficient SAT-Solving Code GenerationFormal Methods
New Ideas and Emerging Results (NIER)
Junjie Sheng East China Normal University, Yanqiu Lin East China Normal University, Jiehao Wu East China Normal University, Yanhong Huang East China Normal University, Jianqi Shi East China Normal University, Min Zhang East China Normal University, Xiangfeng Wang East China Normal University
12:22
7m
Talk
Listening to the Firehose: Sonifying Z3’s BehaviorArtifact-FunctionalArtifact-ReusableArtifact-AvailableFormal Methods
New Ideas and Emerging Results (NIER)
Finn Hackett University of British Columbia, Ivan Beschastnikh University of British Columbia
11:00 - 12:30
Program Comprehension 1Research Track at 204
Chair(s): Wing Lam George Mason University
11:00
15m
Talk
An Empirical Study on Package-Level Deprecation in Python Ecosystem
Research Track
Zhiqing Zhong The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shilin He Microsoft Research, Haoxuan Wang The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), BoXi Yu The Chinese University of Hong Kong, Shenzhen, Haowen Yang The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Pinjia He Chinese University of Hong Kong, Shenzhen
11:15
15m
Talk
Datalog-Based Language-Agnostic Change Impact Analysis for Microservices
Research Track
Qingkai Shi Nanjing University, Xiaoheng Xie Ant Group, Xianjin Fu Ant Group, Peng Di Ant Group & UNSW Sydney, Huawei Li Alibaba Inc., Ang Zhou Ant Group, Gang Fan Ant Group
11:30
15m
Talk
GenC2Rust: Towards Generating Generic Rust Code from CArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Xiafa Wu University of California, Irvine, Brian Demsky University of California at Irvine
11:45
15m
Talk
Instrumentation-Driven Evolution-Aware Runtime Verification
Research Track
Kevin Guan Cornell University, Owolabi Legunsen Cornell University
12:00
15m
Talk
Moye: A Wallbreaker for Monolithic Firmware
Research Track
Jintao Huang Institute of Information Engineering, Chinese Academy of Science & University of Chinese Academy of Sciences, Beijing, China, Kai Yang School of Computer, Electronics and Information, Guangxi University, Gaosheng Wang Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China, Zhiqiang Shi Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China, Zhiwen Pan Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China, Shichao Lv Institute of Information Engineering, Chinese Academy of Science, Limin Sun Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China
12:15
15m
Talk
Understanding and Detecting Peer Dependency Resolving Loop in npm Ecosystem
Research Track
Xingyu Wang Zhejiang University, MingSen Wang Zhejiang University, Wenbo Shen Zhejiang University, Rui Chang Zhejiang University
11:00 - 12:30
Testing and QA 1Research Track / Journal-first Papers at 205
Chair(s): Jonathan Bell Northeastern University
11:00
15m
Talk
Critical Variable State-Aware Directed Greybox Fuzzing
Research Track
Xu Chen Institute of Information Engineering at Chinese Academy of Sciences, China / University of Chinese Academy of Sciences, China, Ningning Cui Institute of Information Engineering at Chinese Academy of Sciences, China / University of Chinese Academy of Sciences, China, Zhe Pan Institute of Information Engineering at Chinese Academy of Sciences, China / University of Chinese Academy of Sciences, China, Liwei Chen Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Gang Shi Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Dan Meng Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences
11:15
15m
Talk
LWDIFF: An LLM-Assisted Differential Testing Framework for WebAssembly Runtimes
Research Track
Shiyao Zhou The Hong Kong Polytechnic University, Jincheng Wang Hong Kong Polytechnic University, He Ye University College London (UCL), Hao Zhou The Hong Kong Polytechnic University, Claire Le Goues Carnegie Mellon University, Xiapu Luo Hong Kong Polytechnic University
11:30
15m
Talk
No Harness, No Problem: Oracle-guided Harnessing for Auto-generating C API Fuzzing HarnessesArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Gabriel Sherman University of Utah, Stefan Nagy University of Utah
11:45
15m
Talk
Parametric Falsification of Many Probabilistic Requirements under Flakiness
Research Track
Matteo Camilli Politecnico di Milano, Raffaela Mirandola Karlsruhe Institute of Technology (KIT)
12:00
15m
Talk
REDII: Test Infrastructure to Enable Deterministic Reproduction of Failures for Distributed Systems
Research Track
Yang Feng Nanjing University, Zheyuan Lin Nanjing University, Dongchen Zhao Nanjing University, Mengbo Zhou Nanjing University, Jia Liu Nanjing University, James Jones University of California at Irvine
12:15
15m
Talk
Adopting Automated Bug Assignment in Practice - A Longitudinal Case Study at Ericsson
Journal-first Papers
Markus Borg CodeScene, Leif Jonsson Ericsson AB, Emelie Engstrom Lund University, Béla Bartalos Verint, Attila Szabo Ericsson
11:00 - 12:30
Gender, Equity and DiversitySE in Society (SEIS) at 206 plus 208
Chair(s): Ronnie de Souza Santos University of Calgary
11:00
15m
Talk
A Socio-Technical Grounded Theory on the Effect of Cognitive Dysfunctions in the Performance of Software Developers with ADHD and Autism
SE in Society (SEIS)
Kiev Gama Universidade Federal de Pernambuco, Grischa Liebel Reykjavik University, Miguel Goulao NOVA-LINCS, FCT/UNL, Aline Lacerda Federal University of Pernambuco (UFPE), Cristiana Lacerda Federal University of Pernambuco (UFPE)
Pre-print
11:15
15m
Talk
Belonging Beyond Code: Queer Software Engineering and Humanities Student Experiences
SE in Society (SEIS)
Emily Vorderwülbeke University of Passau, Isabella Graßl Technical University of Darmstadt
Pre-print
11:30
15m
Talk
Breaking the Silos: An Actionable Framework for Recruiting Diverse Participants in SE
SE in Society (SEIS)
Shandler Mason North Carolina State University, Hank Lenham North Carolina State University, Sandeep Kuttal North Carolina State University
11:45
15m
Talk
Enhancing Women's Experiences in Software Engineering
SE in Society (SEIS)
Júlia Rocha Fortunato University of Brasília, Luana Ribeiro Soares University of Brasília, Gabriela Silva Alves University of Brasília, Edna Dias Canedo University of Brasilia (UnB), Fabiana Freitas Mendes Aalto University
12:00
15m
Talk
Investigating the Developer eXperience of LGBTQIAPN+ People in Agile Teams
SE in Society (SEIS)
Edvaldo R. Wassouf-Jr UFMS, Pedro Fukuda Federal University of Mato Grosso do Sul, Awdren Fontão Federal University of Mato Grosso do Sul (UFMS)
12:15
15m
Talk
There's Nothing to See Here: A Study of Deaf and Hearing Developer Use of Stack Overflow
SE in Society (SEIS)
Steve Counsell Brunel University London, Giuseppe Destefanis Brunel University of London, Rumyana Neykova Brunel University London, Alina Miron Brunel University, Nadine Aburumman Brunel University, Thomas Shippey LogicMonitor
11:00 - 12:30
Human and Social Process 1SE In Practice (SEIP) / New Ideas and Emerging Results (NIER) / Journal-first Papers at 207
Chair(s): Hausi Müller University of Victoria
11:00
15m
Talk
Toward a Theory on Programmer's Block Inspired by Writer's Block
Journal-first Papers
Belinda Schantong Chemnitz University of Technology, Norbert Siegmund Leipzig University, Janet Siegmund Chemnitz University of Technology
Link to publication
11:15
15m
Talk
Digital Twins for Software Engineering Processes
New Ideas and Emerging Results (NIER)
Robin Kimmel University of Stuttgart, Judith Michael University of Regensburg, Andreas Wortmann University of Stuttgart, Jingxi Zhang University of Stuttgart
Pre-print
11:30
15m
Talk
Discovering Ideologies of the Open Source Software Movement
New Ideas and Emerging Results (NIER)
Yang Yue California State University San Marcos, Yi Wang Beijing University of Posts and Telecommunications, David Redmiles University of California, Irvine
11:45
15m
Talk
Identifying Factors Contributing to ``Bad Days'' for Software Developers: A Mixed-Methods Study
SE In Practice (SEIP)
Ike Obi Purdue University, West Lafayette, Jenna L. Butler Microsoft Research, Sankeerti Haniyur Microsoft Corporation, Brian Hassan Microsoft Corporation, Margaret-Anne Storey University of Victoria, Brendan Murphy Microsoft Corporation
12:00
15m
Talk
Time Warp: The Gap Between Developers’ Ideal vs Actual Workweeks in an AI-Driven EraAward Winner
SE In Practice (SEIP)
Sukrit Kumar Georgia Institute of Technology, Drishti Goel Microsoft, Thomas Zimmermann University of California, Irvine, Brian Houck Microsoft Research, B. Ashok Microsoft Research. India, Chetan Bansal Microsoft Research
12:15
15m
Talk
Wearables to measure developer experience at work
SE In Practice (SEIP)
Charlotte Brandebusemeyer Hasso Plattner Institute, University of Potsdam, Tobias Schimmer SAP Labs, Bert Arnrich Hasso Plattner Institute, University of Potsdam
11:00 - 12:30
11:00
15m
Talk
Automated Generation of Accessibility Test Reports from Recorded User TranscriptsAward Winner
Research Track
Syed Fatiul Huq University of California, Irvine, Mahan Tafreshipour University of California at Irvine, Kate Kalcevich Fable Tech Labs Inc., Sam Malek University of California at Irvine
11:15
15m
Talk
KuiTest: Leveraging Knowledge in the Wild as GUI Testing Oracle for Mobile Apps
SE In Practice (SEIP)
Yongxiang Hu Fudan University, Yu Zhang Meituan, Xuan Wang Fudan University, Yingjie Liu School of Computer Science, Fudan University, Shiyu Guo Meituan, Chaoyi Chen Meituan, Xin Wang Fudan University, Yangfan Zhou Fudan University
11:30
15m
Talk
GUIWatcher: Automatically Detecting GUI Lags by Analyzing Mobile Application Screencasts
SE In Practice (SEIP)
Wei Liu Concordia University, Montreal, Canada, Feng Lin Concordia University, Linqiang Guo Concordia University, Tse-Hsun (Peter) Chen Concordia University, Ahmed E. Hassan Queen’s University
11:45
15m
Talk
GUIDE: LLM-Driven GUI Generation Decomposition for Automated Prototyping
Demonstrations
Kristian Kolthoff Institute for Software and Systems Engineering, Clausthal University of Technology, Felix Kretzer human-centered systems Lab (h-lab), Karlsruhe Institute of Technology (KIT) , Christian Bartelt , Alexander Maedche Human-Centered Systems Lab, Karlsruhe Institute of Technology, Simone Paolo Ponzetto Data and Web Science Group, University of Mannheim
Pre-print
12:00
15m
Talk
Agent for User: Testing Multi-User Interactive Features in TikTok
SE In Practice (SEIP)
Sidong Feng Monash University, Changhao Du Jilin University, huaxiao liu Jilin University, Qingnan Wang Jilin University, Zhengwei Lv ByteDance, Gang Huo ByteDance, Xu Yang ByteDance, Chunyang Chen TU Munich
12:15
7m
Talk
Bug Analysis in Jupyter Notebook Projects: An Empirical Study
Journal-first Papers
Taijara Santana Federal University of Bahia, Paulo Silveira Neto Federal University Rural of Pernambuco, Eduardo Santana de Almeida Federal University of Bahia, Iftekhar Ahmed University of California at Irvine
11:00 - 12:30
Testing and SecurityResearch Track / Journal-first Papers at 211
Chair(s): Shiyi Wei University of Texas at Dallas
11:00
15m
Talk
Fuzzing MLIR Compilers with Custom Mutation SynthesisArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Ben Limpanukorn UCLA, Jiyuan Wang University of California at Los Angeles, Hong Jin Kang University of Sydney, Eric Zitong Zhou UCLA, Miryung Kim UCLA and Amazon Web Services
Pre-print
11:15
15m
Talk
InSVDF: Interface-State-Aware Virtual Device Fuzzing
Research Track
Zexiang Zhang National University of Defense Technology, Gaoning Pan Hangzhou Dianzi University, Ruipeng Wang National University of Defense Technology, Yiming Tao Zhejiang University, Zulie Pan National University of Defense Technology, Cheng Tu National University of Defense Technology, Min Zhang National University of Defense Technology, Yang Li National University of Defense Technology, Yi Shen National University of Defense Technology, Chunming Wu Zhejiang University
11:30
15m
Talk
Reduce Dependence for Sound Concurrency Bug Prediction
Research Track
Shihao Zhu State Key Laboratory of Computer Science,Institute of Software,Chinese Academy of Sciences,China, Yuqi Guo Institute of Software, Chinese Academy of Sciences, Yan Cai Institute of Software at Chinese Academy of Sciences, Bin Liang Renmin University of China, Long Zhang Institute of Software, Chinese Academy of Sciences, Rui Chen Beijing Institute of Control Engineering; Beijing Sunwise Information Technology, Tingting Yu Beijing Institute of Control Engineering; Beijing Sunwise Information Technology
11:45
15m
Talk
SAND: Decoupling Sanitization from Fuzzing for Low OverheadArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Ziqiao Kong Nanyang Technological University, Shaohua Li The Chinese University of Hong Kong, Heqing Huang City University of Hong Kong, Zhendong Su ETH Zurich
Link to publication Pre-print Media Attached File Attached
12:00
15m
Talk
TransferFuzz: Fuzzing with Historical Trace for Verifying Propagated Vulnerability CodeSecurity
Research Track
Siyuan Li University of Chinese Academy of Sciences & Institute of Information Engineering Chinese Academy of Sciences, China, Yuekang Li UNSW, Zuxin Chen Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China, Chaopeng Dong Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China, Yongpan Wang University of Chinese Academy of Sciences & Institute of Information Engineering Chinese Academy of Sciences, China, Hong Li Institute of Information Engineering at Chinese Academy of Sciences, Yongle Chen Taiyuan University of Technology, China, Hongsong Zhu Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences
12:15
15m
Talk
Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?Security
Journal-first Papers
Emanuele Iannone Hamburg University of Technology, Giulia Sellitto University of Salerno, Emanuele Iaccarino University of Salerno, Filomena Ferrucci Università di Salerno, Andrea De Lucia University of Salerno, Fabio Palomba University of Salerno
Link to publication DOI Authorizer link Pre-print
11:00 - 12:30
AI for Analysis 1Research Track at 212
Chair(s): Denys Poshyvanyk William & Mary
11:00
15m
Talk
A Multiple Representation Transformer with Optimized Abstract Syntax Tree for Efficient Code Clone Detection
Research Track
TianChen Yu School of Software Engineering, South China University of Technology, Li Yuan School of Software Engineering, South China University of Technology, Guangzhou, China, Liannan Lin School of Software Engineering, South China University of Technology, Hongkui He School of Software Engineering, South China University of Technology
11:15
15m
Talk
Can an LLM find its way around a Spreadsheet?
Research Track
Cho-Ting Lee Virginia Tech, Andrew Neeser Virginia Tech, Shengzhe Xu Virginia Tech, Jay Katyan Virginia Tech, Patrick Cross Virginia Tech, Sharanya Pathakota Virginia Tech, Marigold Norman World Forest ID, John C. Simeone Simeone Consulting, LLC, Jaganmohan Chandrasekaran Virginia Tech, Naren Ramakrishnan Virginia Tech
11:30
15m
Talk
QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement LearningArtifact-Available
Research Track
Alex Sanchez-Stern University of Massachusetts at Amherst, Abhishek Varghese University of Massachusetts, Zhanna Kaufman University of Massachusetts, Shizhuo Zhang University of Illinois Urbana-Champaign, Talia Lily Ringer University of Illinois Urbana-Champaign, Yuriy Brun University of Massachusetts
Link to publication Pre-print
11:45
15m
Talk
TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference
Research Track
Chong Wang Nanyang Technological University, Jian Zhang Nanyang Technological University, Yiling Lou Fudan University, Mingwei Liu Fudan University, Weisong Sun Nanyang Technological University, Yang Liu Nanyang Technological University, Xin Peng Fudan University
12:00
15m
Talk
ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation
Research Track
Xue Jiang , Yihong Dong Peking University, Yongding Tao University of Electronic Science and Technology of China, Huanyu Liu Xidian University, Zhi Jin Peking University, Ge Li Peking University
12:15
15m
Talk
Rango: Adaptive Retrieval-Augmented Proving for Automated Software VerificationArtifact-FunctionalArtifact-AvailableArtifact-ReusableAward Winner
Research Track
Kyle Thompson University of California, San Diego, Nuno Saavedra INESC-ID and IST, University of Lisbon, Pedro Carrott Imperial College London, Kevin Fisher University of California San Diego, Alex Sanchez-Stern University of Massachusetts, Yuriy Brun University of Massachusetts, João F. Ferreira INESC-ID and IST, University of Lisbon, Sorin Lerner University of California at San Diego, Emily First University of California, San Diego
Link to publication Pre-print File Attached
11:00 - 12:30
AutonomyResearch Track at 213
Chair(s): Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland
11:00
15m
Talk
A Differential Testing Framework to Identify Critical AV Failures Leveraging Arbitrary InputsArtifact-FunctionalArtifact-Available
Research Track
Trey Woodlief University of Virginia, Carl Hildebrandt University of Virginia, Sebastian Elbaum University of Virginia
11:15
15m
Talk
Automating a Complete Software Test Process Using LLMs: An Automotive Case Study
Research Track
Shuai Wang , Yinan Yu Chalmers University of Technology, Robert Feldt Chalmers | University of Gothenburg, Dhasarathy Parthasarathy Volvo Group
Pre-print
11:30
15m
Talk
LLM-Agents Driven Automated Simulation Testing and Analysis of small Uncrewed Aerial Systems
Research Track
Venkata Sai Aswath Duvvuru Saint Louis University, Bohan Zhang Saint Louis University, Missouri, Michael Vierhauser University of Innsbruck, Ankit Agrawal Saint Louis University, Missouri
Pre-print Media Attached
11:45
15m
Talk
Efficient Domain Augmentation for Autonomous Driving Testing Using Diffusion ModelsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Luciano Baresi Politecnico di Milano, Davide Yi Xian Hu Politecnico di Milano, Andrea Stocco Technical University of Munich, fortiss, Paolo Tonella USI Lugano
Pre-print
12:00
15m
Talk
GARL: Genetic Algorithm-Augmented Reinforcement Learning to Detect Violations in Marker-Based Autonomous Landing Systems
Research Track
Linfeng Liang Macquarie University, Yao Deng Macquarie University, Kye Morton Skyy Network, Valtteri Kallinen Skyy Network, Alice James Macquarie University, Avishkar Seth Macquarie University, Endrowednes Kuantama Macquarie University, Subhas Mukhopadhyay Macquarie University, Richard Han Macquarie University, Xi Zheng Macquarie University
12:15
15m
Talk
Decictor: Towards Evaluating the Robustness of Decision-Making in Autonomous Driving SystemsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Mingfei Cheng Singapore Management University, Xiaofei Xie Singapore Management University, Yuan Zhou Zhejiang Sci-Tech University, Junjie Wang Tianjin University, Guozhu Meng Institute of Information Engineering, Chinese Academy of Sciences, Kairui Yang DAMO Academy, Alibaba Group, China
11:00 - 12:30
AI for Testing and QA 1Research Track / SE In Practice (SEIP) at 214
Chair(s): Jieshan Chen CSIRO's Data61
11:00
15m
Talk
Does GenAI Make Usability Testing Obsolete?Award Winner
Research Track
Ali Ebrahimi Pourasad , Walid Maalej University of Hamburg
Pre-print
11:15
15m
Talk
Feature-Driven End-To-End Test Generation
Research Track
Parsa Alian University of British Columbia, Noor Nashid University of British Columbia, Mobina Shahbandeh University of British Columbia, Taha Shabani University of British Columbia, Ali Mesbah University of British Columbia
11:30
15m
Talk
SeeAction: Towards Reverse Engineering How-What-Where of HCI Actions from Screencasts for UI AutomationAward Winner
Research Track
Dehai Zhao CSIRO's Data61, Zhenchang Xing CSIRO's Data61, Qinghua Lu Data61, CSIRO, Xiwei (Sherry) Xu Data61, CSIRO, Liming Zhu CSIRO’s Data61
11:45
15m
Talk
Synthesizing Document Database Queries using Collection AbstractionsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Qikang Liu Simon Fraser University, Yang He Simon Fraser University, Yanwen Cai Simon Fraser University, Byeongguk Kwak Simon Fraser University, Yuepeng Wang Simon Fraser University
12:00
15m
Talk
The Power of Types: Exploring the Impact of Type Checking on Neural Bug Detection in Dynamically Typed LanguagesArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Boqi Chen McGill University, José Antonio Hernández López Linköping University, Gunter Mussbacher McGill University, Daniel Varro Linköping University / McGill University
12:15
15m
Talk
DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production
SE In Practice (SEIP)
Xiaoyun Liang ByteDance, Jingyi Ren ByteDance, Jiayi Qi ByteDance, Chao Peng ByteDance, Bo Jiang Bytedance Network Technology
11:00 - 12:30
SE for AI 1New Ideas and Emerging Results (NIER) / SE In Practice (SEIP) / Research Track at 215
Chair(s): Houari Sahraoui DIRO, Université de Montréal
11:00
15m
Talk
A Test Oracle for Reinforcement Learning Software based on Lyapunov Stability Control TheorySE for AIArtifact-AvailableAward Winner
Research Track
Shiyu Zhang The Hong Kong Polytechnic University, Haoyang Song The Hong Kong Polytechnic University, Qixin Wang The Hong Kong Polytechnic University, Henghua Shen The Hong Kong Polytechnic University, Yu Pei The Hong Kong Polytechnic University
11:15
15m
Talk
CodeImprove: Program Adaptation for Deep Code ModelsSE for AI
Research Track
Ravishka Shemal Rathnasuriya University of Texas at Dallas, zijie zhao , Wei Yang UT Dallas
11:30
15m
Talk
FairQuant: Certifying and Quantifying Fairness of Deep Neural NetworksSE for AIArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Brian Hyeongseok Kim University of Southern California, Jingbo Wang University of Southern California, Chao Wang University of Southern California
Pre-print
11:45
15m
Talk
When in Doubt Throw It out: Building on Confident Learning for Vulnerability DetectionSecurityArtifact-FunctionalArtifact-ReusableArtifact-AvailableSE for AI
New Ideas and Emerging Results (NIER)
Yuanjun Gong Renmin University of China, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
Pre-print File Attached
12:00
15m
Talk
Evaluation of Tools and Frameworks for Machine Learning Model ServingSE for AI
SE In Practice (SEIP)
Niklas Beck Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, Benny Stein Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, Dennis Wegener T-Systems International GmbH, Lennard Helmer Fraunhofer Institute for Intelligent Analysis and Information Systems
12:15
15m
Talk
Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation ModelsSE for AI
SE In Practice (SEIP)
Kirill Vasilevski Huawei Canada, Dayi Lin Centre for Software Excellence, Huawei Canada, Ahmed E. Hassan Queen’s University
Pre-print File Attached
11:00 - 12:30
AI for SE 1Research Track at Canada Hall 1 and 2
Chair(s): Tao Chen University of Birmingham
11:00
15m
Talk
Calibration and Correctness of Language Models for CodeArtifact-FunctionalArtifact-Available
Research Track
Claudio Spiess University of California, Davis, David Gros University of California, Davis, Kunal Suresh Pai UC Davis, Michael Pradel University of Stuttgart, Rafiqul Rabin UL Research Institutes, Amin Alipour University of Houston, Susmit Jha SRI, Prem Devanbu University of California at Davis, Toufique Ahmed IBM Research
Pre-print
11:15
15m
Talk
An Empirical Study on Commit Message Generation using LLMs via In-Context Learning
Research Track
Yifan Wu Peking University, Yunpeng Wang Ant Group, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Wei Tao Independent Researcher, Siyu Yu The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Haowen Yang The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Wei Jiang , Jianguo Li Ant Group
Pre-print
11:30
15m
Talk
Instruct or Interact? Exploring and Eliciting LLMs’ Capability in Code Snippet Adaptation Through Prompt Engineering
Research Track
Tanghaoran Zhang National University of Defense Technology, Yue Yu PengCheng Lab, Xinjun Mao National University of Defense Technology, Shangwen Wang National University of Defense Technology, Kang Yang National University of Defense Technology, Yao Lu National University of Defense Technology, Zhang Zhang Key Laboratory of Software Engineering for Complex Systems, National University of Defense Technology, Yuxin Zhao Key Laboratory of Software Engineering for Complex Systems, National University of Defense Technology
11:45
15m
Talk
Search-Based LLMs for Code OptimizationAward Winner
Research Track
Shuzheng Gao The Chinese University of Hong Kong, Cuiyun Gao Harbin Institute of Technology, Wenchao Gu The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong
12:00
15m
Talk
Towards Better Answers: Automated Stack Overflow Post Updating
Research Track
Yubo Mai Zhejiang University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Haoye Wang Hangzhou City University, Tingting Bi The University of Melbourne, Xing Hu Zhejiang University, Xin Xia Huawei, JianLing Sun Zhejiang University
12:15
15m
Talk
Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the FamiliarAward Winner
Research Track
Yuanliang Zhang National University of Defense Technology, Yifan Xie , Shanshan Li National University of Defense Technology, Ke Liu , Chong Wang National University of Defense Technology, Zhouyang Jia National University of Defense Technology, Xiangbing Huang National University of Defense Technology, Jie Song National University of Defense Technology, Chaopeng Luo National University of Defense Technology, Zhizheng Zheng National University of Defense Technology, Rulin Xu National University of Defense Technology, Yitong Liu National University of Defense Technology, Si Zheng National University of Defense Technology, Liao Xiangke National University of Defense Technology
13:30 - 14:00
13:30
30m
Poster
Pattern-based Generation and Adaptation of Quantum WorkflowsQuantum
Research Track
Martin Beisel Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Johanna Barzen University of Stuttgart, Frank Leymann University of Stuttgart, Lavinia Stiliadou Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Daniel Vietz University of Stuttgart, Benjamin Weder Institute of Architecture of Application Systems (IAAS), University of Stuttgart
13:30
30m
Talk
Mole: Efficient Crash Reproduction in Android Applications With Enforcing Necessary UI Events
Journal-first Papers
Maryam Masoudian Sharif University of Technology, Hong Kong University of Science and Technology (HKUST), Heqing Huang City University of Hong Kong, Morteza Amini Sharif University of Technology, Charles Zhang Hong Kong University of Science and Technology
13:30
30m
Talk
Automated Testing Linguistic Capabilities of NLP Models
Journal-first Papers
Jaeseong Lee The University of Texas at Dallas, Simin Chen University of Texas at Dallas, Austin Mordahl University of Illinois Chicago, Cong Liu University of California, Riverside, Wei Yang UT Dallas, Shiyi Wei University of Texas at Dallas
13:30
30m
Poster
BSan: A Powerful Identifier-Based Hardware-Independent Memory Error Detector for COTS BinariesArtifact-FunctionalArtifact-Available
Research Track
Wen Zhang University of Georgia, Botang Xiao University of Georgia, Qingchen Kong University of Georgia, Le Guan University of Georgia, Wenwen Wang University of Georgia
13:30
30m
Talk
A Unit Proofing Framework for Code-level Verification: A Research AgendaFormal Methods
New Ideas and Emerging Results (NIER)
Paschal Amusuo Purdue University, Parth Vinod Patil Purdue University, Owen Cochell Michigan State University, Taylor Le Lievre Purdue University, James C. Davis Purdue University
Pre-print
13:30
30m
Talk
Listening to the Firehose: Sonifying Z3’s BehaviorArtifact-FunctionalArtifact-ReusableArtifact-AvailableFormal Methods
New Ideas and Emerging Results (NIER)
Finn Hackett University of British Columbia, Ivan Beschastnikh University of British Columbia
13:30
30m
Talk
Towards Early Warning and Migration of High-Risk Dormant Open-Source Software DependenciesSecurity
New Ideas and Emerging Results (NIER)
Zijie Huang Shanghai Key Laboratory of Computer Software Testing and Evaluation, Lizhi Cai Shanghai Key Laboratory of Computer Software Testing & Evaluating, Shanghai Software Center, Xuan Mao Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China, Kang Yang Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai Development Center of Computer Software Technology
13:30
30m
Poster
SimClone: Detecting Tabular Data Clones using Value Similarity
Journal-first Papers
Xu Yang University of Manitoba, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Dayi Lin Centre for Software Excellence, Huawei Canada, Shaowei Wang University of Manitoba, Zhen Ming (Jack) Jiang York University
13:30
30m
Talk
SolSearch: An LLM-Driven Framework for Efficient SAT-Solving Code GenerationFormal Methods
New Ideas and Emerging Results (NIER)
Junjie Sheng East China Normal University, Yanqiu Lin East China Normal University, Jiehao Wu East China Normal University, Yanhong Huang East China Normal University, Jianqi Shi East China Normal University, Min Zhang East China Normal University, Xiangfeng Wang East China Normal University
15:30 - 16:00
15:30
30m
Poster
Non-Autoregressive Line-Level Code Completion
Journal-first Papers
Fang Liu Beihang University, Zhiyi Fu Peking University, Ge Li Peking University, Zhi Jin Peking University, Hui Liu Beijing Institute of Technology, Yiyang Hao Silicon Heart Tech Co., Li Zhang Beihang University
15:30
30m
Poster
FlatD: Protecting Deep Neural Network Program from Reversing Attacks
SE In Practice (SEIP)
Jinquan Zhang The Pennsylvania State University, Zihao Wang Penn State University, Pei Wang Independent Researcher, Rui Zhong Palo Alto Networks, Dinghao Wu Pennsylvania State University
15:30
30m
Talk
Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-PracticeSE for AI
Journal-first Papers
Bentley Oakes Polytechnique Montréal, Michalis Famelis Université de Montréal, Houari Sahraoui DIRO, Université de Montréal
DOI Pre-print File Attached
15:30
30m
Poster
Predicting the First Response Latency of Maintainers and Contributors in Pull Requests
Journal-first Papers
SayedHassan Khatoonabadi Concordia University, Montreal, Ahmad Abdellatif University of Calgary, Diego Elias Costa Concordia University, Canada, Emad Shihab Concordia University, Montreal
15:30
30m
Talk
LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation
Journal-first Papers
Sarah Fakhoury Microsoft Research, Aaditya Naik University of Pennsylvania, Georgios Sakkas University of California at San Diego, Saikat Chakraborty Microsoft Research, Shuvendu K. Lahiri Microsoft Research
Link to publication
15:30
30m
Poster
RustAssistant: Using LLMs to Fix Compilation Errors in Rust Code
Research Track
Pantazis Deligiannis Microsoft Research, Akash Lal Microsoft Research, Nikita Mehrotra Microsoft Research, Rishi Poddar Microsoft Research, Aseem Rastogi Microsoft Research
15:30
30m
Talk
QuanTest: Entanglement-Guided Testing of Quantum Neural Network SystemsQuantum
Journal-first Papers
Jinjing Shi Central South University, Zimeng Xiao Central South University, Heyuan Shi Central South University, Yu Jiang Tsinghua University, Xuelong LI China Telecom
Link to publication
16:00 - 17:30
Formal Methods 2Research Track / New Ideas and Emerging Results (NIER) / Journal-first Papers at 103
Chair(s): Yi Li Nanyang Technological University
16:00
15m
Talk
ConsCS: Effective and Efficient Verification of Circom CircuitsFormal Methods
Research Track
Jinan Jiang The Hong Kong Polytechnic University, Xinghao Peng , Jinzhao Chu The Hong Kong Polytechnic University, Xiapu Luo Hong Kong Polytechnic University
16:15
15m
Talk
Constrained LTL Specification Learning from ExamplesFormal MethodsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Changjian Zhang Carnegie Mellon University, Parv Kapoor Carnegie Mellon University, Ian Dardik Carnegie Mellon University, Leyi Cui Columbia University, Romulo Meira-Goes The Pennsylvania State University, David Garlan Carnegie Mellon University, Eunsuk Kang Carnegie Mellon University
16:30
15m
Talk
LLM-aided Automatic Modeling for Security Protocol VerificationSecurityFormal Methods
Research Track
Ziyu Mao Zhejiang University, Jingyi Wang Zhejiang University, Jun Sun Singapore Management University, Shengchao Qin Xidian University, Jiawen Xiong East China Normal University
16:45
15m
Talk
Model Assisted Refinement of Metamorphic Relations for Scientific SoftwareFormal Methods
New Ideas and Emerging Results (NIER)
Clay Stevens Iowa State University, Katherine Kjeer Iowa State University, Ryan Richard Iowa State University, Edward Valeev Virginia Tech, Myra Cohen Iowa State University
17:00
15m
Talk
Precisely Extracting Complex Variable Values from Android AppsFormal Methods
Journal-first Papers
Marc Miltenberger Fraunhofer SIT; ATHENE, Steven Arzt Fraunhofer SIT; ATHENE
17:15
7m
Talk
A Unit Proofing Framework for Code-level Verification: A Research AgendaFormal Methods
New Ideas and Emerging Results (NIER)
Paschal Amusuo Purdue University, Parth Vinod Patil Purdue University, Owen Cochell Michigan State University, Taylor Le Lievre Purdue University, James C. Davis Purdue University
Pre-print
17:22
7m
Talk
Automated Testing Linguistic Capabilities of NLP Models
Journal-first Papers
Jaeseong Lee The University of Texas at Dallas, Simin Chen University of Texas at Dallas, Austin Mordahl University of Illinois Chicago, Cong Liu University of California, Riverside, Wei Yang UT Dallas, Shiyi Wei University of Texas at Dallas
16:00 - 17:30
Databases and BusinessResearch Track / SE In Practice (SEIP) / Demonstrations / Journal-first Papers at 104
Chair(s): Lu Xiao Stevens Institute of Technology
16:00
15m
Talk
Optimization of Automated and Manual Software Tests in Industrial Practice: A Survey and Historical Analysis
Journal-first Papers
Roman Haas Saarland University; CQSE, Raphael Nömmer Saarbr�cken Graduate School of Computer Science, CQSE, Elmar Juergens CQSE GmbH, Sven Apel Saarland University
Link to publication Pre-print
16:15
15m
Talk
A-COBREX : A Tool for Identifying Business Rules in COBOL Programs
Demonstrations
Samveg Shah Indian Institute of Technology, Tirupati, Shivali Agarwal IBM, Saravanan Krishnan IBM India Research Lab, Vini Kanvar IBM Research, Sridhar Chimalakonda Indian Institute of Technology Tirupati
16:30
15m
Talk
Thanos: DBMS Bug Detection via Storage Engine Rotation Based Differential TestingAward Winner
Research Track
Ying Fu National University of Defense Technology, Zhiyong Wu Tsinghua University, China, Yuanliang Zhang National University of Defense Technology, Jie Liang , Jingzhou Fu School of Software, Tsinghua University, Yu Jiang Tsinghua University, Shanshan Li National University of Defense Technology, Liao Xiangke National University of Defense Technology
16:45
15m
Talk
Coni: Detecting Database Connector Bugs via State-Aware Test Case Generation
Research Track
Wenqian Deng Tsinghua University, Zhiyong Wu Tsinghua University, China, Jie Liang , Jingzhou Fu School of Software, Tsinghua University, Mingzhe Wang Tsinghua University, Yu Jiang Tsinghua University
17:00
15m
Talk
Puppy: Finding Performance Degradation Bugs in DBMSs via Limited-Optimization Plan Construction
Research Track
Zhiyong Wu Tsinghua University, China, Jie Liang , Jingzhou Fu School of Software, Tsinghua University, Mingzhe Wang Tsinghua University, Yu Jiang Tsinghua University
17:15
15m
Talk
Safe Validation of Pricing Agreements
SE In Practice (SEIP)
John C. Kolesar Yale University, Tancrède Lepoint Amazon, Martin Schäf Amazon Web Services, Willem Visser Amazon Web Services
16:00 - 17:30
Program Comprehension 2Journal-first Papers / Research Track at 204
Chair(s): Xiaoxue Ren Zhejiang University
16:00
15m
Talk
Enhancing Fault Localization in Industrial Software Systems via Contrastive Learning
Research Track
Chun Li Nanjing University, Hui Li Samsung Electronics (China) R&D Centre, Zhong Li , Minxue Pan Nanjing University, Xuandong Li Nanjing University
16:15
15m
Talk
On the Understandability of MLOps System Architectures
Journal-first Papers
Stephen John Warnett University of Vienna, Uwe Zdun University of Vienna
Link to publication DOI
16:30
15m
Talk
Bridging the Language Gap: An Empirical Study of Bindings for Open Source Machine Learning Libraries Across Software Package Ecosystems
Journal-first Papers
Hao Li Queen's University, Cor-Paul Bezemer University of Alberta
Link to publication DOI Pre-print
16:45
15m
Talk
Understanding Code Understandability Improvements in Code Reviews
Journal-first Papers
Delano Hélio Oliveira , Reydne Bruno dos Santos UFPE, Benedito Fernando Albuquerque de Oliveira Federal University of Pernambuco, Martin Monperrus KTH Royal Institute of Technology, Fernando Castor University of Twente, Fernanda Madeiral Universidade Federal de Pernambuco
17:00
15m
Talk
Automatic Commit Message Generation: A Critical Review and Directions for Future Work
Journal-first Papers
Yuxia Zhang Beijing Institute of Technology, Zhiqing Qiu Beijing Institute of Technology, Klaas-Jan Stol Lero; University College Cork; SINTEF Digital , Wenhui Zhu Beijing Institute of Technology, Jiaxin Zhu Institute of Software at Chinese Academy of Sciences, Yingchen Tian Tmall Technology Co., Hui Liu Beijing Institute of Technology
17:15
7m
Talk
Efficient Management of Containers for Software Defined Vehicles
Journal-first Papers
Anwar Ghammam Oakland University, Rania Khalsi University of Michigan - Flint, Marouane Kessentini University of Michigan - Flint, Foyzul Hassan University of Michigan at Dearborn
16:00 - 17:30
Testing and QA 2Journal-first Papers at 205
Chair(s): Andreas Zeller CISPA Helmholtz Center for Information Security
16:00
15m
Talk
EpiTESTER: Testing Autonomous Vehicles with Epigenetic Algorithm and Attention Mechanism
Journal-first Papers
Chengjie Lu Simula Research Laboratory and University of Oslo, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Tao Yue Beihang University
16:15
15m
Talk
GenMorph: Automatically Generating Metamorphic Relations via Genetic Programming
Journal-first Papers
Jon Ayerdi Mondragon University, Valerio Terragni University of Auckland, Gunel Jahangirova King's College London, Aitor Arrieta Mondragon University, Paolo Tonella USI Lugano
16:30
15m
Talk
Guess the State: Exploiting Determinism to Improve GUI Exploration Efficiency
Journal-first Papers
Diego Clerissi University of Milano-Bicocca, Giovanni Denaro University of Milano - Bicocca, Marco Mobilio University of Milano Bicocca, Leonardo Mariani University of Milano-Bicocca
16:45
15m
Talk
Runtime Verification and Field-based Testing for ROS-based Robotic Systems
Journal-first Papers
Ricardo Caldas Gran Sasso Science Institute (GSSI), Juan Antonio Piñera García Gran Sasso Science Institute, Matei Schiopu Chalmers | Gothenburg University, Patrizio Pelliccione Gran Sasso Science Institute, L'Aquila, Italy, Genaína Nunes Rodrigues University of Brasília, Thorsten Berger Ruhr University Bochum
Link to publication DOI
17:00
15m
Talk
Towards Effectively Testing Machine Translation Systems from White-Box Perspectives
Journal-first Papers
Hanying Shao University of Waterloo, Zishuo Ding The Hong Kong University of Science and Technology (Guangzhou), Weiyi Shang University of Waterloo, Jinqiu Yang Concordia University, Nikolaos Tsantalis Concordia University
17:15
15m
Talk
Using Knowledge Units of Programming Languages to Recommend Reviewers for Pull Requests: An Empirical Study
Journal-first Papers
Md Ahasanuzzaman Queen's University, Gustavo A. Oliva Queen's University, Ahmed E. Hassan Queen’s University, Md Ahasanuzzaman Queen's University
16:00 - 17:45
Human and Social 1SE in Society (SEIS) / SE In Practice (SEIP) at 206 plus 208
Chair(s): Yvonne Dittrich IT University of Copenhagen, Denmark
16:00
15m
Talk
Systematizing Inclusive Design in MOSIP: An Experience Report
SE In Practice (SEIP)
Soumiki Chattopadhyay Oregon State University, Amreeta Chatterjee Oregon State University, Puja Agarwal Oregon State University, Bianca Trinkenreich Colorado State University, Swarathmika Kumar MOSIP-IIIT Bangalore, Rohit Ranjan Rai MOSIP-IIIT Bangalore, Resham Chugani MOSIP-IIIT Bangalore, Pragya Kumari MOSIP-IIIT Bangalore, Margaret Burnett Oregon State University, Anita Sarma Oregon State University
Pre-print
16:15
15m
Talk
A Collaborative Framework for Cross-Domain Scientific Experiments for Society 5.0Artifact-ReusableArtifact-AvailableResearch MethodsArtifact-Functional
SE in Society (SEIS)
Muhammad Mainul Hossain University of Saskatchewan, Banani Roy University of Saskatchewan, Chanchal K. Roy University of Saskatchewan, Kevin Schneider University of Saskatchewan
16:30
15m
Talk
A First Look at AI Trends in Value-Aligned Software Engineering Publications: Human-LLM Insights
SE in Society (SEIS)
Ahmad Azarnik Universiti Teknologi Malaysia, Davoud Mougouei , Mahdi Fahmideh University of Southern Queensland, Elahe Mougouei Islamic Azad University Najafabad, Hoa Khanh Dam University of Wollongong, Arif Ali Khan University of Oulu, Saima Rafi Edinburgh Napier University, Javed Ali Khan University of Hertforshire Hertfordshire, UK, Aakash Ahmad School of Computing and Communications, Lancaster University Leipzig, Leipzig, Germany
Link to publication
16:45
15m
Talk
From Expectation to Habit: Why Do Software Practitioners Adopt Fairness Toolkits?Artifact-ReusableArtifact-AvailableArtifact-Functional
SE in Society (SEIS)
Gianmario Voria University of Salerno, Stefano Lambiase University of Salerno, Maria Concetta Schiavone University of Salerno, Gemma Catolino University of Salerno, Fabio Palomba University of Salerno
Pre-print
17:00
15m
Talk
Not real or too soft? On the challenges of publishing interdisciplinary software engineering researchArtifact-Available
SE in Society (SEIS)
Sonja Hyrynsalmi LUT University, Grischa Liebel Reykjavik University, Ronnie de Souza Santos University of Calgary, Sebastian Baltes University of Bayreuth
Pre-print
17:15
15m
Talk
What is unethical about software? User perceptions in the Netherlands
SE in Society (SEIS)
Yagil Elias Vrije Universiteit Amsterdam, Tom P Humbert Vrije Universiteit Amsterdam, Lauren Olson Vrije Universiteit Amsterdam, Emitzá Guzmán Vrije Universiteit Amsterdam
Pre-print
16:00 - 17:30
Human and Social Process 2Journal-first Papers / Research Track at 207
Chair(s): Armstrong Foundjem École Polytechnique de Montréal
16:00
15m
Talk
An Empirical Study on Developers' Shared Conversations with ChatGPT in GitHub Pull Requests and Issues
Journal-first Papers
Huizi Hao Queen's University, Canada, Kazi Amit Hasan Queen's University, Canada, Hong Qin Queen's University, Marcos Macedo Queen's University, Yuan Tian Queen's University, Kingston, Ontario, Ding Steven, H., H. Queen’s University at Kingston, Ahmed E. Hassan Queen’s University
16:15
15m
Talk
Who’s Pushing the Code: An Exploration of GitHub Impersonation
Research Track
Yueke Zhang Vanderbilt University, Anda Liang Vanderbilt University, Xiaohan Wang Vanderbilt University, Pamela J. Wisniewski Vanderbilt University, Fengwei Zhang Southern University of Science and Technology, Kevin Leach Vanderbilt University, Yu Huang Vanderbilt University
16:30
15m
Talk
Understanding Real-time Collaborative Programming: a Study of Visual Studio Live Share
Journal-first Papers
Xin Tan Beihang University, Xinyue Lv Beihang University, Jing Jiang Beihang University, Li Zhang Beihang University
16:45
15m
Talk
Characterizing the Prevalence, Distribution, and Duration of Stale Reviewer Recommendations
Journal-first Papers
Farshad Kazemi University of Waterloo, Maxime Lamothe Polytechnique Montreal, Shane McIntosh University of Waterloo
17:00
15m
Talk
Diversity's Double-Edged Sword: Analyzing Race's Effect on Remote Pair Programming Interactions
Journal-first Papers
Shandler Mason North Carolina State University, Sandeep Kuttal North Carolina State University
17:15
7m
Talk
Investigating the Impact of Interpersonal Challenges on Feeling Welcome in OSS
Research Track
Bianca Trinkenreich Colorado State University, Zixuan Feng Oregon State University, USA, Rudrajit Choudhuri Oregon State University, Marco Gerosa Northern Arizona University, Anita Sarma Oregon State University, Igor Steinmacher NAU RESHAPE LAB
Pre-print
16:00 - 17:30
SE for AI with SecurityResearch Track at 210
Chair(s): Lina Marsso École Polytechnique de Montréal
16:00
15m
Talk
Understanding the Effectiveness of Coverage Criteria for Large Language Models: A Special Angle from Jailbreak AttacksSecuritySE for AIArtifact-Available
Research Track
shide zhou Huazhong University of Science and Technology, Li Tianlin NTU, Kailong Wang Huazhong University of Science and Technology, Yihao Huang NTU, Ling Shi Nanyang Technological University, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology
16:15
15m
Talk
Diversity Drives Fairness: Ensemble of Higher Order Mutants for Intersectional Fairness of Machine Learning SoftwareSecuritySE for AI
Research Track
Zhenpeng Chen Nanyang Technological University, Xinyue Li Peking University, Jie M. Zhang King's College London, Federica Sarro University College London, Yang Liu Nanyang Technological University
Pre-print
16:30
15m
Talk
HIFI: Explaining and Mitigating Algorithmic Bias through the Lens of Game-Theoretic InteractionsSecuritySE for AIArtifact-Available
Research Track
Lingfeng Zhang East China Normal University, Zhaohui Wang Software Engineering Institute, East China Normal University, Yueling Zhang East China Normal University, Min Zhang East China Normal University, Jiangtao Wang Software Engineering Institute, East China Normal University
16:45
15m
Talk
Towards More Trustworthy Deep Code Models by Enabling Out-of-Distribution DetectionSecuritySE for AI
Research Track
Yanfu Yan William & Mary, Viet Duong William & Mary, Huajie Shao College of William & Mary, Denys Poshyvanyk William & Mary
17:00
15m
Talk
FairSense: Long-Term Fairness Analysis of ML-Enabled SystemsSecuritySE for AIArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Yining She Carnegie Mellon University, Sumon Biswas Carnegie Mellon University, Christian Kästner Carnegie Mellon University, Eunsuk Kang Carnegie Mellon University
16:00 - 17:30
RequirementsResearch Track / Demonstrations / New Ideas and Emerging Results (NIER) at 211
Chair(s): Jane Cleland-Huang University of Notre Dame
16:00
15m
Talk
A Little Goes a Long Way: Tuning Configuration Selection for Continuous Kernel FuzzingArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Sanan Hasanov University of Central Florida, Stefan Nagy University of Utah, Paul Gazzillo University of Central Florida
16:15
15m
Talk
Exploring the Robustness of the Effect of EVO on Intention Valuation through ReplicationArtifact-FunctionalArtifact-AvailableArtifact-ReusableAward Winner
Research Track
Yesugen Baatartogtokh University of Massachusetts Amherst, Kaitlyn Cook Smith College, Alicia M. Grubb Smith College
16:30
15m
Talk
Unavoidable Boundary Conditions: A Control Perspective on Goal Conflicts
Research Track
Sebastian Uchitel Universidad de Buenos Aires / Imperial College, Francisco Cirelli Universidad de Buenos Aires, Dalal Alrajeh Imperial College London
16:45
15m
Talk
User Personas Improve Social Sustainability by Encouraging Software Developers to Deprioritize Antisocial Features
Research Track
Bimpe Ayoola Dalhousie University, Miikka Kuutila Dalhousie University, Rina R. Wehbe Dalhousie University, Paul Ralph Dalhousie University
Pre-print
17:00
15m
Talk
VReqST: A Requirement Specification Tool for Virtual Reality Software ProductsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Demonstrations
Amogha A Halhalli Software Engineering Research Center. IIIT Hyderabad, Raghu Reddy IIIT Hyderabad, Karre Sai Anirudh Phenom Inc.
17:15
15m
Talk
What is a Feature, Really? Toward a Unified Understanding Across SE Disciplines
New Ideas and Emerging Results (NIER)
Nitish Patkar FHNW, Aimen Fahmi Tata Consultancy Services, Timo Kehrer University of Bern, Norbert Seyff University of Applied Sciences and Arts Northwestern Switzerland FHNW
16:00 - 17:30
AI for Analysis 2Research Track / Journal-first Papers at 212
Chair(s): Julia Rubin The University of British Columbia
16:00
15m
Talk
Neurosymbolic Modular Refinement Type Inference
Research Track
Georgios Sakkas UC San Diego, Pratyush Sahu UC San Diego, Kyeling Ong University of California, San Diego, Ranjit Jhala University of California at San Diego
16:15
15m
Talk
An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?
Research Track
Hyunjae Suh University of California, Irvine, Mahan Tafreshipour University of California at Irvine, Jiawei Li University of California Irvine, Adithya Bhattiprolu University of California, Irvine, Iftekhar Ahmed University of California at Irvine
16:30
15m
Talk
Planning a Large Language Model for Static Detection of Runtime Errors in Code SnippetsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Smit Soneshbhai Patel University of Texas at Dallas, Aashish Yadavally University of Texas at Dallas, Hridya Dhulipala University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas
16:45
15m
Talk
LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion
Research Track
Chong Wang Nanyang Technological University, Kaifeng Huang Tongji University, Jian Zhang Nanyang Technological University, Yebo Feng Nanyang Technological University, Lyuye Zhang Nanyang Technological University, Yang Liu Nanyang Technological University, Xin Peng Fudan University
17:00
15m
Talk
Knowledge-Enhanced Program Repair for Data Science Code
Research Track
Shuyin Ouyang King's College London, Jie M. Zhang King's College London, Zeyu Sun Institute of Software, Chinese Academy of Sciences, Albert Merono Penuela King's College London
17:15
7m
Talk
SparseCoder: Advancing Source Code Analysis with Sparse Attention and Learned Token Pruning
Journal-first Papers
Xueqi Yang North Carolina State University, Mariusz Jakubowski Microsoft, Li Kang Microsoft, Haojie Yu Microsoft, Tim Menzies North Carolina State University
Link to publication DOI
16:00 - 17:30
AI for Program Comprehension 1Research Track at 213
Chair(s): Yintong Huo Singapore Management University, Singapore
16:00
15m
Talk
ADAMAS: Adaptive Domain-Aware Performance Anomaly Detection in Cloud Service Systems
Research Track
Wenwei Gu The Chinese University of Hong Kong, Jiazhen Gu Chinese University of Hong Kong, Jinyang Liu Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Jianping Zhang The Chinese University of Hong Kong, Jinxi Kuang The Chinese University of Hong Kong, Cong Feng Huawei Cloud Computing Technology, Yongqiang Yang Huawei Cloud Computing Technology, Michael Lyu The Chinese University of Hong Kong
16:15
15m
Talk
LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models
Research Track
Zeyang Ma Concordia University, Dong Jae Kim DePaul University, Tse-Hsun (Peter) Chen Concordia University
16:30
15m
Talk
Model Editing for LLMs4Code: How Far are We?
Research Track
Xiaopeng Li National University of Defense Technology, Shangwen Wang National University of Defense Technology, Shasha Li National University of Defense Technology, Jun Ma National University of Defense Technology, Jie Yu National University of Defense Technology, Xiaodong Liu National University of Defense Technology, Jing Wang National University of Defense Technology, Bin Ji National University of Defense Technology, Weimin Zhang National University of Defense Technology
Pre-print
16:45
15m
Talk
Software Model Evolution with Large Language Models: Experiments on Simulated, Public, and Industrial Datasets
Research Track
Christof Tinnes Saarland University, Alisa Carla Welter Saarland University, Sven Apel Saarland University
Pre-print
17:00
15m
Talk
SpecRover: Code Intent Extraction via LLMs
Research Track
Haifeng Ruan National University of Singapore, Yuntong Zhang National University of Singapore, Abhik Roychoudhury National University of Singapore
17:15
15m
Talk
Unleashing the True Potential of Semantic-based Log Parsing with Pre-trained Language ModelsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Van-Hoang Le The University of Newcastle, Yi Xiao , Hongyu Zhang Chongqing University
16:00 - 17:30
AI for Testing and QA 2Research Track / SE In Practice (SEIP) at 214
Chair(s): Michael Pradel University of Stuttgart
16:00
15m
Talk
Faster Configuration Performance Bug Testing with Neural Dual-level PrioritizationArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Youpeng Ma University of Electronic Science and Technology of China, Tao Chen University of Birmingham, Ke Li University of Exeter
Pre-print
16:15
15m
Talk
Metamorphic-Based Many-Objective Distillation of LLMs for Code-related TasksArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Annibale Panichella Delft University of Technology
16:30
15m
Talk
NIODebugger: A Novel Approach to Repair Non-Idempotent-Outcome Tests with LLM-Based Agent
Research Track
Kaiyao Ke University of Illinois at Urbana-Champaign
16:45
15m
Talk
Test Intention Guided LLM-based Unit Test Generation
Research Track
Zifan Nan Huawei, Zhaoqiang Guo Software Engineering Application Technology Lab, Huawei, China, Kui Liu Huawei, Xin Xia Huawei
17:00
15m
Talk
What You See Is What You Get: Attention-based Self-guided Automatic Unit Test Generation
Research Track
Xin Yin Zhejiang University, Chao Ni Zhejiang University, xiaodanxu College of Computer Science and Technology, Zhejiang university, Xiaohu Yang Zhejiang University
Pre-print
17:15
15m
Talk
Improving Code Performance Using LLMs in Zero-Shot: RAPGen
SE In Practice (SEIP)
Spandan Garg Microsoft Corporation, Roshanak Zilouchian Moghaddam Microsoft, Neel Sundaresan Microsoft
16:00 - 17:30
Analysis 1Research Track / SE In Practice (SEIP) / Journal-first Papers at 215
Chair(s): Antonio Filieri AWS and Imperial College London
16:00
15m
Talk
SUPERSONIC: Learning to Generate Source Code Optimizations in C/C++
Journal-first Papers
Zimin Chen KTH Royal Institute of Technology, Sen Fang North Carolina State University, Martin Monperrus KTH Royal Institute of Technology
16:15
15m
Talk
An Extensive Empirical Study of Nondeterministic Behavior in Static Analysis ToolsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Miao Miao The University of Texas at Dallas, Austin Mordahl University of Illinois Chicago, Dakota Soles The University of Texas at Dallas, Alice Beideck The University of Texas at Dallas, Shiyi Wei University of Texas at Dallas
16:30
15m
Talk
Interactive Cross-Language Pointer Analysis for Resolving Native Code in Java ProgramsArtifact-FunctionalArtifact-AvailableArtifact-ReusableAward Winner
Research Track
Chenxi Zhang Nanjing University, Yufei Liang Nanjing University, Tian Tan Nanjing University, Chang Xu Nanjing University, Shuangxiang Kan UNSW, Yulei Sui University of New South Wales, Yue Li Nanjing University
16:45
15m
Talk
Execution Trace Reconstruction Using Diffusion-Based Generative Models
Research Track
Madeline Janecek Brock University, Naser Ezzati Jivan , Wahab Hamou-Lhadj Concordia University, Montreal, Canada
17:00
15m
Talk
Static Analysis of Remote Procedure Call in Java Programs
Research Track
Baoquan Cui Institute of Software at Chinese Academy of Sciences, China, RongQu State Key Laboratory of Computer Science, Institute of Software Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China, Zhen Tang Key Laboratory of System Software (Chinese Academy of Sciences), State Key Laboratory of Computer Science, Institute of Software Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China, Jian Zhang Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences
17:15
15m
Talk
ArkAnalyzer: The Static Analysis Framework for OpenHarmony
SE In Practice (SEIP)
chenhaonan Beihang University, Daihang Chen Beihang University, Yizhuo Yang Beihang University, Lingyun Xu Huawei, Liang Gao Huawei, Mingyi Zhou Monash University, Chunming Hu Beihang University, Li Li Beihang University
16:00 - 17:30
AI for SE 2Research Track / Journal-first Papers at Canada Hall 1 and 2
Chair(s): Tingting Yu University of Connecticut
16:00
15m
Talk
Large Language Models for Safe MinimizationArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Aashish Yadavally University of Texas at Dallas, Xiaokai Rong The University of Texas at Dallas, Phat Nguyen The University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas
16:15
15m
Talk
LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Journal-first Papers
Da Song University of Alberta, Xuan Xie University of Alberta, Jiayang Song University of Alberta, Derui Zhu Technical University of Munich, Yuheng Huang University of Alberta, Canada, Felix Juefei-Xu New York University, Lei Ma The University of Tokyo & University of Alberta, Yuheng Huang University of Alberta, Canada
16:30
15m
Talk
Intention is All You Need: Refining Your Code from Your Intention
Research Track
Qi Guo Tianjin University, Xiaofei Xie Singapore Management University, Shangqing Liu Nanyang Technological University, Ming Hu Nanyang Technological University, Xiaohong Li Tianjin University, Lei Bu Nanjing University
16:45
15m
Talk
RLCoder: Reinforcement Learning for Repository-Level Code Completion
Research Track
Yanlin Wang Sun Yat-sen University, Yanli Wang Sun Yat-sen University, Daya Guo , Jiachi Chen Sun Yat-sen University, Ruikai Zhang Huawei Cloud Computing Technologies, Yuchi Ma Huawei Cloud Computing Technologies, Zibin Zheng Sun Yat-sen University
17:00
15m
Talk
InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation
Research Track
Marcos Macedo Queen's University, Yuan Tian Queen's University, Kingston, Ontario, Pengyu Nie University of Waterloo, Filipe Cogo Centre for Software Excellence, Huawei Canada, Bram Adams Queen's University
17:15
15m
Talk
Toward a Theory of Causation for Interpreting Neural Code Models
Journal-first Papers
David Nader Palacio William & Mary, Alejandro Velasco William & Mary, Nathan Cooper William & Mary, Alvaro Rodriguez Universidad Nacional de Colombia, Kevin Moran University of Central Florida, Denys Poshyvanyk William & Mary
Link to publication DOI Pre-print

Thu 1 May

Displayed time zone: Eastern Time (US & Canada) change

10:30 - 11:00
10:30
30m
Poster
Pattern-based Generation and Adaptation of Quantum WorkflowsQuantum
Research Track
Martin Beisel Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Johanna Barzen University of Stuttgart, Frank Leymann University of Stuttgart, Lavinia Stiliadou Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Daniel Vietz University of Stuttgart, Benjamin Weder Institute of Architecture of Application Systems (IAAS), University of Stuttgart
10:30
30m
Talk
A Unit Proofing Framework for Code-level Verification: A Research AgendaFormal Methods
New Ideas and Emerging Results (NIER)
Paschal Amusuo Purdue University, Parth Vinod Patil Purdue University, Owen Cochell Michigan State University, Taylor Le Lievre Purdue University, James C. Davis Purdue University
Pre-print
10:30
30m
Talk
SolSearch: An LLM-Driven Framework for Efficient SAT-Solving Code GenerationFormal Methods
New Ideas and Emerging Results (NIER)
Junjie Sheng East China Normal University, Yanqiu Lin East China Normal University, Jiehao Wu East China Normal University, Yanhong Huang East China Normal University, Jianqi Shi East China Normal University, Min Zhang East China Normal University, Xiangfeng Wang East China Normal University
10:30
30m
Talk
Listening to the Firehose: Sonifying Z3’s BehaviorArtifact-FunctionalArtifact-ReusableArtifact-AvailableFormal Methods
New Ideas and Emerging Results (NIER)
Finn Hackett University of British Columbia, Ivan Beschastnikh University of British Columbia
10:30
30m
Poster
HyperCRX 2.0: A Comprehensive and Automated Tool for Empowering GitHub Insights
Demonstrations
Yantong Wang East China Normal University, Shengyu Zhao Tongji University, will wang , Fenglin Bi East China Normal University
10:30
30m
Talk
Using ML filters to help automated vulnerability repairs: when it helps and when it doesn’tSecurity
New Ideas and Emerging Results (NIER)
Maria Camporese University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
Pre-print
10:30
30m
Talk
Automated Testing Linguistic Capabilities of NLP Models
Journal-first Papers
Jaeseong Lee The University of Texas at Dallas, Simin Chen University of Texas at Dallas, Austin Mordahl University of Illinois Chicago, Cong Liu University of California, Riverside, Wei Yang UT Dallas, Shiyi Wei University of Texas at Dallas
10:30
30m
Poster
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models
Research Track
Kunpeng Zhang The Hong Kong University of Science and Technology, Shuai Wang Hong Kong University of Science and Technology, Jitao Han Central University of Finance and Economics, Xiaogang Zhu The University of Adelaide, Xian Li Swinburne University of Technology, Shaohua Wang Central University of Finance and Economics, Sheng Wen Swinburne University of Technology
11:00 - 12:30
11:00
15m
Talk
A Large-Scale Study of Model Integration in ML-Enabled Software SystemsSE for AIArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Yorick Sens Ruhr University Bochum, Henriette Knopp Ruhr University Bochum, Sven Peldszus Ruhr University Bochum, Thorsten Berger Ruhr University Bochum
Pre-print
11:15
15m
Talk
Are LLMs Correctly Integrated into Software Systems?SE for AIArtifact-Available
Research Track
Yuchen Shao East China Normal University, Yuheng Huang the University of Tokyo, Jiawei Shen East China Normal University, Lei Ma The University of Tokyo & University of Alberta, Ting Su East China Normal University, Chengcheng Wan East China Normal University
11:30
15m
Talk
Patch Synthesis for Property Repair of Deep Neural NetworksSE for AIArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Zhiming Chi Institute of Software, Chinese Academy of Sciences, Jianan Ma Hangzhou Dianzi University, China; Zhejiang University, Hangzhou, China, Pengfei Yang Institute of Software at Chinese Academy of Sciences, China, Cheng-Chao Huang Nanjing Institute of Software Technology, ISCAS, Renjue Li Institute of Software at Chinese Academy of Sciences, China, Jingyi Wang Zhejiang University, Xiaowei Huang University of Liverpool, Lijun Zhang Institute of Software, Chinese Academy of Sciences
11:45
15m
Talk
Optimizing Experiment Configurations for LLM Applications Through Exploratory AnalysisSE for AI
New Ideas and Emerging Results (NIER)
Nimrod Busany Accenture Labs, Israel, Hananel Hadad Accenture Labs, Israel, Zofia Maszlanka Avanade, Poland, Rohit Shelke University of Ottawa, Canada, Gregory Price University of Ottawa, Canada, Okhaide Akhigbe University of Ottawa, Daniel Amyot University of Ottawa
12:00
15m
Talk
AI-Assisted SQL Authoring at Industry ScaleSE for AI
SE In Practice (SEIP)
Chandra Sekhar Maddila Meta Platforms, Inc., Negar Ghorbani Meta Platforms Inc., Kosay Jabre Meta Platforms, Inc., Vijayaraghavan Murali Meta Platforms Inc., Edwin Kim Meta Platforms, Inc., Parth Thakkar Meta Platforms, Inc., Nikolay Pavlovich Laptev Meta Platforms, Inc., Olivia Harman Meta Platforms, Inc., Diana Hsu Meta Platforms, Inc., Rui Abreu Meta, Peter C Rigby Meta / Concordia University
12:15
15m
Talk
Automating ML Model Development at ScaleSE for AI
SE In Practice (SEIP)
Kaiyuan Wang Google, Yang Li Google Inc, Junyang Shen Google Inc, Kaikai Sheng Google Inc, Yiwei You Google Inc, Jiaqi Zhang Google Inc, Srikar Ayyalasomayajula Google Inc, Julian Grady Google Inc, Martin Wicke Google Inc
11:00 - 12:30
Analysis 2SE In Practice (SEIP) / Journal-first Papers / Demonstrations at 205
Chair(s): Mahmoud Alfadel University of Calgary
11:00
15m
Talk
SIT: An accurate, compliant SBOM generator with incremental construction
Demonstrations
Changguo Jia Peking University, NIANYU LI ZGC Lab, China, Minghui Zhou Peking University, Kai Yang
11:15
15m
Talk
Towards Better Static Analysis Bug Reports in the Clang Static Analyzer
SE In Practice (SEIP)
Kristóf Umann Eötvös Loránd University, Faculty of Informatics, Dept. of Programming Languages and Compilers, Zoltán Porkoláb Ericsson
11:30
15m
Talk
Automatic Identification of Game Stuttering via Gameplay Videos Analysis
Journal-first Papers
Emanuela Guglielmi University of Molise, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Rocco Oliveto University of Molise, Simone Scalabrino University of Molise
11:45
15m
Talk
LLM Driven Smart Assistant for Data Mapping
SE In Practice (SEIP)
Arihant Bedagkar Tata Consultancy Services, Sayandeep Mitra Tata Consultancy Services, Raveendra Kumar Medicherla TCS Research, Tata Consultancy Services, Ravindra Naik TCS Research, TRDDC, India, Samiran Pal Tata Consultancy Services
12:00
15m
Talk
On the Diagnosis of Flaky Job Failures: Understanding and Prioritizing Failure CategoriesArtifact-AvailableArtifact-FunctionalArtifact-Reusable
SE In Practice (SEIP)
Henri Aïdasso École de technologie supérieure (ÉTS), Francis Bordeleau École de Technologie Supérieure (ETS), Ali Tizghadam TELUS
Pre-print
12:15
7m
Talk
AddressWatcher: Sanitizer-Based Localization of Memory Leak Fixes
Journal-first Papers
Aniruddhan Murali University of Waterloo, Mahmoud Alfadel University of Calgary, Mei Nagappan University of Waterloo, Meng Xu University of Waterloo, Chengnian Sun University of Waterloo
11:00 - 12:30
Human and Social 2Research Track / Journal-first Papers at 206 plus 208
Chair(s): Alexander Serebrenik Eindhoven University of Technology
11:00
15m
Talk
Code Today, Deadline Tomorrow: Procrastination Among Software Developers
Research Track
Zeinabsadat Saghi University of Southern California, Thomas Zimmermann University of California, Irvine, Souti Chattopadhyay University of Southern California
11:15
15m
Talk
"Get Me In The Groove": A Mixed Methods Study on Supporting ADHD Professional Programmers
Research Track
Kaia Newman Carnegie Mellon University, Sarah Snay University of Michigan, Madeline Endres University of Massachusetts Amherst, Manasvi Parikh University of Michigan, Andrew Begel Carnegie Mellon University
Pre-print
11:30
15m
Talk
Hints Help Finding and Fixing Bugs Differently in Python and Text-based Program Representations
Research Track
Ruchit Rawal Max Planck Institute for Software Systems, Victor-Alexandru Padurean Max Planck Institute for Software Systems, Sven Apel Saarland University, Adish Singla Max Planck Institute for Software Systems, Mariya Toneva Max Planck Institute for Software Systems
Pre-print
11:45
15m
Talk
How Scientists Use Jupyter Notebooks: Goals, Quality Attributes, and OpportunitiesArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Ruanqianqian (Lisa) Huang University of California, San Diego, Savitha Ravi UC San Diego, Michael He UCSD, Boyu Tian University of California, San Diego, Sorin Lerner University of California at San Diego, Michael Coblenz University of California, San Diego
Pre-print
12:00
15m
Talk
Investigating the Online Recruitment and Selection Journey of Novice Software Engineers: Anti-patterns and Recommendations
Journal-first Papers
Miguel Setúbal Federal University of Ceará, Tayana Conte Universidade Federal do Amazonas, Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Allysson Allex Araújo Federal University of Cariri
Link to publication Pre-print
12:15
15m
Talk
Reputation Gaming in Crowd Technical Knowledge Sharing
Journal-first Papers
Iren Mazloomzadeh École Polytechnique de Montréal, Gias Uddin York University, Canada, Foutse Khomh Polytechnique Montréal, Ashkan Sami Edinburgh Napier University
11:00 - 12:30
Security and Analysis 1Research Track / SE In Practice (SEIP) at 210
Chair(s): Akond Rahman Auburn University
11:00
15m
Talk
Accounting for Missing Events in Statistical Information Leakage AnalysisSecurityArtifact-FunctionalArtifact-Available
Research Track
Seongmin Lee Max Planck Institute for Security and Privacy (MPI-SP), Shreyas Minocha Georgia Tech, Marcel Böhme MPI for Security and Privacy
11:15
15m
Talk
AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software ArtifactsSecurity
Research Track
Setu Kumar Basak North Carolina State University, K. Virgil English North Carolina State University, Ken Ogura North Carolina State University, Vitesh Kambara North Carolina State University, Bradley Reaves North Carolina State University, Laurie Williams North Carolina State University
11:30
15m
Talk
Enhancing The Open Network: Definition and Automated Detection of Smart Contract DefectsBlockchainSecurityAward Winner
Research Track
Hao Song , Teng Li University of Electronic Science and Technology of China, Jiachi Chen Sun Yat-sen University, Ting Chen University of Electronic Science and Technology of China, Beibei Li Sichuan University, Zhangyan Lin University of Electronic Science and Technology of China, Yi Lu BitsLab, Pan Li MoveBit, Xihan Zhou TonBit
11:45
15m
Talk
Detecting Python Malware in the Software Supply Chain with Program AnalysisArtifact-AvailableArtifact-FunctionalArtifact-ReusableSecurity
SE In Practice (SEIP)
Ridwan Salihin Shariffdeen National University of Singapore, Behnaz Hassanshahi Oracle Labs, Australia, Martin Mirchev National University of Singapore, Ali El Husseini National University of Singapore, Abhik Roychoudhury National University of Singapore
12:00
15m
Talk
$ZTD_{JAVA}$: Mitigating Software Supply Chain Vulnerabilities via Zero-Trust DependenciesSecurity
Research Track
Paschal Amusuo Purdue University, Kyle A. Robinson Purdue University, Tanmay Singla Purdue University, Huiyun Peng Mount Holyoke College, Aravind Machiry Purdue University, Santiago Torres-Arias Purdue University, Laurent Simon Google, James C. Davis Purdue University
Pre-print
12:15
15m
Talk
FairChecker: Detecting Fund-stealing Bugs in DeFi Protocols via Fairness ValidationBlockchainSecurity
Research Track
Yi Sun Purdue University, USA, Zhuo Zhang Purdue University, Xiangyu Zhang Purdue University
11:00 - 12:30
AI for Design and ArchitectureDemonstrations / SE In Practice (SEIP) / Research Track at 211
Chair(s): Sarah Nadi New York University Abu Dhabi
11:00
15m
Talk
An LLM-Based Agent-Oriented Approach for Automated Code Design Issue LocalizationArtifact-Available
Research Track
Fraol Batole Tulane University, David OBrien Iowa State University, Tien N. Nguyen University of Texas at Dallas, Robert Dyer University of Nebraska-Lincoln, Hridesh Rajan Tulane University
11:15
15m
Talk
Distilled Lifelong Self-Adaptation for Configurable SystemsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Yulong Ye University of Birmingham, Tao Chen University of Birmingham, Miqing Li University of Birmingham
Pre-print
11:30
15m
Talk
The Software Librarian: Python Package Insights for Copilot
Demonstrations
Jasmine Latendresse Concordia University, Nawres Day ISSAT Sousse, SayedHassan Khatoonabadi Concordia University, Montreal, Emad Shihab Concordia University, Montreal
11:45
15m
Talk
aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing
SE In Practice (SEIP)
Siyuan Jiang , Jia Li Peking University, He Zong aiXcoder, Huanyu Liu Peking University, Hao Zhu Peking University, Shukai Hu aiXcoder, Erlu Li aiXcoder, Jiazheng Ding aiXcoder, Ge Li Peking University
Pre-print
12:00
15m
Talk
Leveraging MLOps: Developing a Sequential Classification System for RFQ Documents in Electrical Engineering
SE In Practice (SEIP)
Claudio Martens Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Hammam Abdelwahab Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Katharina Beckh Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Birgit Kirsch Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Vishwani Gupta Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Dennis Wegener Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Steffen Hoh Schneider Electric
12:15
15m
Talk
On Mitigating Code LLM Hallucinations with API Documentation
SE In Practice (SEIP)
Nihal Jain Amazon Web Services, Robert Kwiatkowski , Baishakhi Ray Columbia University, Murali Krishna Ramanathan AWS AI Labs, Varun Kumar AWS AI Labs
11:00 - 12:30
AI for Analysis 3SE In Practice (SEIP) / Research Track at 212
Chair(s): Gias Uddin York University, Canada
11:00
15m
Talk
COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge
Research Track
Yichen LI The Chinese University of Hong Kong, Yulun Wu The Chinese University of Hong Kong, Jinyang Liu Chinese University of Hong Kong, Zhihan Jiang The Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Guangba  Yu The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong
11:15
15m
Talk
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding
Research Track
Yifeng Di Purdue University, Tianyi Zhang Purdue University
11:30
15m
Talk
HumanEvo: An Evolution-aware Benchmark for More Realistic Evaluation of Repository-level Code Generation
Research Track
Dewu Zheng Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Ensheng Shi Xi’an Jiaotong University, Ruikai Zhang Huawei Cloud Computing Technologies, Yuchi Ma Huawei Cloud Computing Technologies, Hongyu Zhang Chongqing University, Zibin Zheng Sun Yat-sen University
11:45
15m
Talk
SEMANTIC CODE FINDER: An Efficient Semantic Search Framework for Large-Scale Codebases
SE In Practice (SEIP)
daeha ryu Innovation Center, Samsung Electronics, Seokjun Ko Samsung Electronics Co., Eunbi Jang Innovation Center, Samsung Electronics, jinyoung park Innovation Center, Samsung Electronics, myunggwan kim Innovation Center, Samsung Electronics, changseo park Innovation Center, Samsung Electronics
12:00
15m
Talk
Time to Retrain? Detecting Concept Drifts in Machine Learning Systems
SE In Practice (SEIP)
Tri Minh-Triet Pham Concordia University, Karthikeyan Premkumar Ericsson, Mohamed Naili Ericsson, Jinqiu Yang Concordia University
12:15
15m
Talk
UML Sequence Diagram Generation: A Multi-Model, Multi-Domain Evaluation
SE In Practice (SEIP)
Chi Xiao Ericsson AB, Daniel Ståhl Ericsson AB, Jan Bosch Chalmers University of Technology
11:00 - 12:30
AI for RequirementsResearch Track / SE In Practice (SEIP) / Journal-first Papers / New Ideas and Emerging Results (NIER) at 213
Chair(s): Jennifer Horkoff Chalmers and the University of Gothenburg
11:00
15m
Talk
From Bugs to Benefits: Improving User Stories by Leveraging Crowd Knowledge with CrUISE-AC
Research Track
Stefan Schwedt Heriot-Watt University, UK, Thomas Ströder FHDW Mettmann
11:15
15m
Talk
LiSSA: Toward Generic Traceability Link Recovery through Retrieval-Augmented GenerationArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Dominik Fuchß Karlsruhe Institute of Technology (KIT), Tobias Hey Karlsruhe Institute of Technology (KIT), Jan Keim Karlsruhe Institute of Technology (KIT), Haoyu Liu Karlsruhe Institute of Technology (KIT), Niklas Ewald Karlsruhe Institute of Technology (KIT), Tobias Thirolf Karlsruhe Institute of Technology (KIT), Anne Koziolek Karlsruhe Institute of Technology
Pre-print Media Attached
11:30
15m
Talk
Replication in Requirements Engineering: the NLP for RE Case
Journal-first Papers
Sallam Abualhaija University of Luxembourg, Fatma Başak Aydemir Utrecht University, Fabiano Dalpiaz Utrecht University, Davide Dell'Anna Utrecht University, Alessio Ferrari CNR-ISTI, Xavier Franch Universitat Politècnica de Catalunya, Davide Fucci Blekinge Institute of Technology
11:45
15m
Talk
On the Impact of Requirements Smells in Prompts: The Case of Automated TraceabilityArtifact-FunctionalArtifact-ReusableArtifact-Available
New Ideas and Emerging Results (NIER)
Andreas Vogelsang paluno, University of Duisburg-Essen, Alexander Korn University of Duisburg-Essen, Giovanna Broccia ISTI-CNR, FMT Lab, Alessio Ferrari Consiglio Nazionale delle Ricerche (CNR) and University College Dublin (UCD), Jannik Fischbach Netlight Consulting GmbH and fortiss GmbH, Chetan Arora Monash University
12:00
15m
Talk
NICE: Non-Functional Requirements Identification, Classification, and Explanation Using Small Language ModelsArtifact-AvailableAward Winner
SE In Practice (SEIP)
Gokul Rejithkumar TCS Research, Preethu Rose Anish TCS Research
Pre-print
11:00 - 12:30
AI for Testing and QA 3Research Track at 214
Chair(s): Mike Papadakis University of Luxembourg
11:00
15m
Talk
A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs
Research Track
Myeongsoo Kim Georgia Institute of Technology, Tyler Stennett Georgia Institute of Technology, Saurabh Sinha IBM Research, Alessandro Orso Georgia Institute of Technology
11:15
15m
Talk
ClozeMaster: Fuzzing Rust Compiler by Harnessing LLMs for Infilling Masked Real ProgramsArtifact-Available
Research Track
Hongyan Gao State Key Laboratory for Novel Software Technology, Nanjing University, Yibiao Yang Nanjing University, Maolin Sun Nanjing University, Jiangchang Wu State Key Laboratory for Novel Software Technology, Nanjing University, Yuming Zhou Nanjing University, Baowen Xu State Key Laboratory for Novel Software Technology, Nanjing University
11:30
15m
Talk
LLM Based Input Space Partitioning Testing for Library APIsArtifact-FunctionalArtifact-Available
Research Track
Jiageng Li Fudan University, Zhen Dong Fudan University, Chong Wang Nanyang Technological University, Haozhen You Fudan University, Cen Zhang Georgia Institute of Technology, Yang Liu Nanyang Technological University, Xin Peng Fudan University
11:45
15m
Talk
Leveraging Large Language Models for Enhancing the Understandability of Generated Unit TestsArtifact-FunctionalArtifact-Available
Research Track
Amirhossein Deljouyi Delft University of Technology, Roham Koohestani Delft University of Technology, Maliheh Izadi Delft University of Technology, Andy Zaidman TU Delft
DOI Pre-print
12:00
15m
Talk
exLong: Generating Exceptional Behavior Tests with Large Language ModelsArtifact-Available
Research Track
Jiyang Zhang University of Texas at Austin, Yu Liu Meta, Pengyu Nie University of Waterloo, Junyi Jessy Li University of Texas at Austin, USA, Milos Gligoric The University of Texas at Austin
12:15
15m
Talk
TOGLL: Correct and Strong Test Oracle Generation with LLMsArtifact-Available
Research Track
Soneya Binta Hossain University of Virginia, Matthew B Dwyer University of Virginia
11:00 - 12:30
SE for AI 2New Ideas and Emerging Results (NIER) / Research Track at 215
Chair(s): Grace Lewis Carnegie Mellon Software Engineering Institute
11:00
15m
Talk
Answering User Questions about Machine Learning Models through Standardized Model CardsSE for AI
Research Track
Tajkia Rahman Toma University of Alberta, Balreet Grewal University of Alberta, Cor-Paul Bezemer University of Alberta
Pre-print
11:15
15m
Talk
Fairness Testing through Extreme Value TheorySE for AI
Research Track
Verya Monjezi University of Texas at El Paso, Ashutosh Trivedi University of Colorado Boulder, Vladik Kreinovich University of Texas at El Paso, Saeid Tizpaz-Niari University of Illinois Chicago
11:30
15m
Talk
Fixing Large Language Models' Specification Misunderstanding for Better Code GenerationSE for AI
Research Track
Zhao Tian Tianjin University, Junjie Chen Tianjin University, Xiangyu Zhang Purdue University
Pre-print
11:45
15m
Talk
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model AgentsSE for AI
Research Track
Feng Lin Concordia University, Dong Jae Kim DePaul University, Tse-Hsun (Peter) Chen Concordia University
12:00
15m
Talk
The Product Beyond the Model -- An Empirical Study of Repositories of Open-Source ML ProductsSE for AI
Research Track
Nadia Nahar Carnegie Mellon University, Haoran Zhang Carnegie Mellon University, Grace Lewis Carnegie Mellon Software Engineering Institute, Shurui Zhou University of Toronto, Christian Kästner Carnegie Mellon University
12:15
15m
Talk
Towards Trustworthy LLMs for Code: A Data-Centric Synergistic Auditing FrameworkSE for AI
New Ideas and Emerging Results (NIER)
Chong Wang Nanyang Technological University, Zhenpeng Chen Nanyang Technological University, Li Tianlin NTU, Yilun Zhang AIXpert, Yang Liu Nanyang Technological University
13:00 - 13:30
13:00
30m
Talk
BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks
Research Track
Yisong Xiao Beihang University, Aishan Liu Beihang University; Institute of Dataspace, Xinwei Zhang Beihang University, Tianyuan Zhang Beihang University, Li Tianlin NTU, Siyuan Liang National University of Singapore, Xianglong Liu Beihang University; Institute of Dataspace; Zhongguancun Laboratory, Yang Liu Nanyang Technological University, Dacheng Tao Nanyang Technological University
13:00
30m
Talk
Ethical Issues in Video Games: Insights from Reddit Discussions
SE in Society (SEIS)
Yeqian Li Vrije Universiteit Amsterdam, Kousar Aslam Vrije Universiteit Amsterdam
13:00
30m
Talk
An Empirical Study on Developers' Shared Conversations with ChatGPT in GitHub Pull Requests and Issues
Journal-first Papers
Huizi Hao Queen's University, Canada, Kazi Amit Hasan Queen's University, Canada, Hong Qin Queen's University, Marcos Macedo Queen's University, Yuan Tian Queen's University, Kingston, Ontario, Ding Steven, H., H. Queen’s University at Kingston, Ahmed E. Hassan Queen’s University
13:00
30m
Talk
QuanTest: Entanglement-Guided Testing of Quantum Neural Network SystemsQuantum
Journal-first Papers
Jinjing Shi Central South University, Zimeng Xiao Central South University, Heyuan Shi Central South University, Yu Jiang Tsinghua University, Xuelong LI China Telecom
Link to publication
13:00
30m
Poster
FlatD: Protecting Deep Neural Network Program from Reversing Attacks
SE In Practice (SEIP)
Jinquan Zhang The Pennsylvania State University, Zihao Wang Penn State University, Pei Wang Independent Researcher, Rui Zhong Palo Alto Networks, Dinghao Wu Pennsylvania State University
13:00
30m
Talk
Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-PracticeSE for AI
Journal-first Papers
Bentley Oakes Polytechnique Montréal, Michalis Famelis Université de Montréal, Houari Sahraoui DIRO, Université de Montréal
DOI Pre-print File Attached
13:00
30m
Talk
On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools.Security
Journal-first Papers
Aurora Papotti Vrije Universiteit Amsterdam, Ranindya Paramitha University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
13:00
30m
Talk
Automating Explanation Need Management in App Reviews: A Case Study from the Navigation App Industry
SE In Practice (SEIP)
Martin Obaidi Leibniz Universität Hannover, Nicolas Voß Graphmasters GmbH, Hannah Deters Leibniz University Hannover, Jakob Droste Leibniz Universität Hannover, Marc Herrmann Leibniz University Hannover, Jannik Fischbach Netlight Consulting GmbH and fortiss GmbH, Kurt Schneider Leibniz Universität Hannover, Software Engineering Group
13:30 - 14:00
13:30
30m
Poster
Non-Autoregressive Line-Level Code Completion
Journal-first Papers
Fang Liu Beihang University, Zhiyi Fu Peking University, Ge Li Peking University, Zhi Jin Peking University, Hui Liu Beijing Institute of Technology, Yiyang Hao Silicon Heart Tech Co., Li Zhang Beihang University
13:30
30m
Talk
LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation
Journal-first Papers
Sarah Fakhoury Microsoft Research, Aaditya Naik University of Pennsylvania, Georgios Sakkas University of California at San Diego, Saikat Chakraborty Microsoft Research, Shuvendu K. Lahiri Microsoft Research
Link to publication
13:30
30m
Talk
SusDevOps: Promoting Sustainability to a First Principle in Software Delivery
New Ideas and Emerging Results (NIER)
Istvan David McMaster University / McMaster Centre for Software Certification (McSCert)
13:30
30m
Poster
Predicting the First Response Latency of Maintainers and Contributors in Pull Requests
Journal-first Papers
SayedHassan Khatoonabadi Concordia University, Montreal, Ahmad Abdellatif University of Calgary, Diego Elias Costa Concordia University, Canada, Emad Shihab Concordia University, Montreal
13:30
30m
Poster
RustAssistant: Using LLMs to Fix Compilation Errors in Rust Code
Research Track
Pantazis Deligiannis Microsoft Research, Akash Lal Microsoft Research, Nikita Mehrotra Microsoft Research, Rishi Poddar Microsoft Research, Aseem Rastogi Microsoft Research
13:30
30m
Talk
Relevant information in TDD experiment reporting
Journal-first Papers
Fernando Uyaguari Instituto Superior Tecnológico Wissen, Silvia Teresita Acuña Castillo Universidad Autónoma de Madrid, John W. Castro Universidad de Atacama, Davide Fucci Blekinge Institute of Technology, Oscar Dieste Universidad Politécnica de Madrid, Sira Vegas Universidad Politecnica de Madrid
14:00 - 15:30
Testing and QA 3Research Track / Journal-first Papers at 205
Chair(s): Michael Pradel University of Stuttgart
14:00
15m
Talk
Increasing the Effectiveness of Automatically Generated Tests by Improving Class ObservabilityAward Winner
Research Track
Geraldine Galindo-Gutierrez Centro de Investigación en Ciencias Exactas e Ingenierías, Universidad Católica Boliviana, Juan Pablo Sandoval Alcocer Pontificia Universidad Católica de Chile, Nicolas Jimenez-Fuentes Pontificia Universidad Católica de Chile, Alexandre Bergel University of Chile, Gordon Fraser University of Passau
14:15
15m
Talk
Invivo Fuzzing by Amplifying Actual ExecutionsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Octavio Galland Canonical, Marcel Böhme MPI for Security and Privacy
14:30
15m
Talk
Towards High-strength Combinatorial Interaction Testing for Highly Configurable Software SystemsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Chuan Luo Beihang University, Shuangyu Lyu Beihang University, Wei Wu Central South University; Xiangjiang Laboratory, Hongyu Zhang Chongqing University, Dianhui Chu Harbin Institute of Technology, Chunming Hu Beihang University
14:45
15m
Talk
WDD: Weighted Delta DebuggingArtifact-FunctionalArtifact-Available
Research Track
Xintong Zhou University of Waterloo, Zhenyang Xu University of Waterloo, Mengxiao Zhang University of Waterloo, Yongqiang Tian , Chengnian Sun University of Waterloo
15:00
15m
Talk
TopSeed: Learning Seed Selection Strategies for Symbolic Execution from ScratchArtifact-FunctionalArtifact-Available
Research Track
Jaehyeok Lee Sungkyunkwan University, Sooyoung Cha Sungkyunkwan University
15:15
15m
Talk
Hunting bugs: Towards an automated approach to identifying which change caused a bug through regression testing
Journal-first Papers
Michel Maes Bermejo Universidad Rey Juan Carlos, Alexander Serebrenik Eindhoven University of Technology, Micael Gallego Universidad Rey Juan Carlos, Francisco Gortázar Universidad Rey Juan Carlos, Gregorio Robles Universidad Rey Juan Carlos, Jesus M. Gonzalez-Barahona Universidad Rey Juan Carlos
14:00 - 15:30
AI for Testing and QA 4Journal-first Papers / Demonstrations / Research Track at 206 plus 208
Chair(s): Andreas Jedlitschka Fraunhofer IESE
14:00
15m
Talk
The Seeds of the FUTURE Sprout from History: Fuzzing for Unveiling Vulnerabilities in Prospective Deep-Learning LibrariesSecurityAward Winner
Research Track
Zhiyuan Li , Jingzheng Wu Institute of Software, The Chinese Academy of Sciences, Xiang Ling Institute of Software, Chinese Academy of Sciences, Tianyue Luo Institute of Software, Chinese Academy of Sciences, ZHIQING RUI Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Yanjun Wu Institute of Software, Chinese Academy of Sciences
14:15
15m
Talk
AutoRestTest: A Tool for Automated REST API Testing Using LLMs and MARL
Demonstrations
Tyler Stennett Georgia Institute of Technology, Myeongsoo Kim Georgia Institute of Technology, Saurabh Sinha IBM Research, Alessandro Orso Georgia Institute of Technology
14:30
15m
Talk
FairBalance: How to Achieve Equalized Odds With Data Pre-processing
Journal-first Papers
Zhe Yu Rochester Institute of Technology, Joymallya Chakraborty Amazon.com, Tim Menzies North Carolina State University
14:45
15m
Talk
RLocator: Reinforcement Learning for Bug Localization
Journal-first Papers
Partha Chakraborty University of Waterloo, Mahmoud Alfadel University of Calgary, Mei Nagappan University of Waterloo
15:00
15m
Talk
Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP
Journal-first Papers
Lukas Schulte University of Passau, Benjamin Ledel Digital Learning GmbH, Steffen Herbold University of Passau
15:15
15m
Talk
Test Generation Strategies for Building Failure Models and Explaining Spurious Failures
Journal-first Papers
Baharin Aliashrafi Jodat University of Ottawa, Abhishek Chandar University of Ottawa, Shiva Nejati University of Ottawa, Mehrdad Sabetzadeh University of Ottawa
Pre-print
14:00 - 15:30
Human and Social using AI 1Research Track at 207
Chair(s): Romain Robbes CNRS, LaBRI, University of Bordeaux
14:00
15m
Talk
Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers
Research Track
Yuling Shi Shanghai Jiao Tong University, Hongyu Zhang Chongqing University, Chengcheng Wan East China Normal University, Xiaodong Gu Shanghai Jiao Tong University
14:15
15m
Talk
Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword?
Research Track
Rosalia Tufano Università della Svizzera Italiana, Alberto Martin-Lopez Software Institute - USI, Lugano, Ahmad Tayeb , Ozren Dabic Software Institute, Università della Svizzera italiana (USI), Switzerland, Sonia Haiduc , Gabriele Bavota Software Institute @ Università della Svizzera Italiana
14:30
15m
Talk
An Exploratory Study of ML Sketches and Visual Code AssistantsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Luis F. Gomes Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University, Jonathan Aldrich Carnegie Mellon University, Rui Abreu Faculty of Engineering of the University of Porto, Portugal
14:45
15m
Talk
LiCoEval: Evaluating LLMs on License Compliance in Code Generation
Research Track
Weiwei Xu Peking University, Kai Gao Peking University, Hao He Carnegie Mellon University, Minghui Zhou Peking University
Pre-print
15:00
15m
Talk
Trust Dynamics in AI-Assisted Development: Definitions, Factors, and Implications
Research Track
Sadra Sabouri University of Southern California, Philipp Eibl University of Southern California, Xinyi Zhou University of Southern California, Morteza Ziyadi Amazon AGI, Nenad Medvidović University of Southern California, Lars Lindemann University of Southern California, Souti Chattopadhyay University of Southern California
Pre-print
15:15
15m
Talk
What Guides Our Choices? Modeling Developers' Trust and Behavioral Intentions Towards GenAI
Research Track
Rudrajit Choudhuri Oregon State University, Bianca Trinkenreich Colorado State University, Rahul Pandita GitHub, Inc., Eirini Kalliamvakou GitHub, Igor Steinmacher NAU RESHAPE LAB, Marco Gerosa Northern Arizona University, Christopher Sanchez Oregon State University, Anita Sarma Oregon State University
Pre-print
14:00 - 15:30
AI for Security 1Research Track at 210
Chair(s): Tao Chen University of Birmingham
14:00
15m
Talk
Large Language Models as Configuration ValidatorsSecurityArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Xinyu Lian University of Illinois at Urbana-Champaign, Yinfang Chen University of Illinois at Urbana-Champaign, Runxiang Cheng University of Illinois at Urbana-Champaign, Jie Huang University of Illinois at Urbana-Champaign, Parth Thakkar Meta Platforms, Inc., Minjia Zhang UIUC, Tianyin Xu University of Illinois at Urbana-Champaign
14:15
15m
Talk
LLM Assistance for Memory SafetySecurity
Research Track
Nausheen Mohammed Microsoft Research, Akash Lal Microsoft Research, Aseem Rastogi Microsoft Research, Subhajit Roy IIT Kanpur, Rahul Sharma Microsoft Research
14:30
15m
Talk
Vulnerability Detection with Code Language Models: How Far Are We?Security
Research Track
Yangruibo Ding Columbia University, Yanjun Fu University of Maryland, Omniyyah Ibrahim King Abdulaziz City for Science and Technology, Chawin Sitawarin University of California, Berkeley, Xinyun Chen , Basel Alomair King Abdulaziz City for Science and Technology, David Wagner UC Berkeley, Baishakhi Ray Columbia University, Yizheng Chen University of Maryland
14:45
15m
Talk
Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with JustificationsBlockchainSecurity
Research Track
Wei Ma , Daoyuan Wu Hong Kong University of Science and Technology, Yuqiang Sun Nanyang Technological University, Tianwen Wang National University of Singapore, Shangqing Liu Nanyang Technological University, Jian Zhang Nanyang Technological University, Yue Xue , Yang Liu Nanyang Technological University
15:00
15m
Talk
Towards Neural Synthesis for SMT-assisted Proof-Oriented ProgrammingSecurityFormal MethodsAward Winner
Research Track
Saikat Chakraborty Microsoft Research, Gabriel Ebner Microsoft Research, Siddharth Bhat University of Cambridge, Sarah Fakhoury Microsoft Research, Sakina Fatima University of Ottawa, Shuvendu K. Lahiri Microsoft Research, Nikhil Swamy Microsoft Research
15:15
15m
Talk
Prompt-to-SQL Injections in LLM-Integrated Web Applications: Risks and DefensesSecuritySE for AIArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Rodrigo Resendes Pedro INESC-ID / IST, Universidade de Lisboa, Miguel E. Coimbra INESC-ID; Instituto Superior Técnico - University of Lisbon, Daniel Castro INESC-ID / IST, Universidade de Lisboa, Paulo Carreira INESC-ID / IST, Universidade de Lisboa, Nuno Santos INESC-ID; Instituto Superior Técnico - University of Lisbon
14:00 - 15:30
Analysis 3Research Track / Journal-first Papers at 212
Chair(s): Shaowei Wang University of Manitoba
14:00
15m
Talk
Boosting Path-Sensitive Value Flow Analysis via Removal of Redundant Summaries
Research Track
Yongchao WANG Hong Kong University of Science and Technology, Yuandao Cai Hong Kong University of Science and Technology, Charles Zhang Hong Kong University of Science and Technology
Pre-print
14:15
15m
Talk
Dockerfile Flakiness: Characterization and Repair
Research Track
Taha Shabani University of British Columbia, Noor Nashid University of British Columbia, Parsa Alian University of British Columbia, Ali Mesbah University of British Columbia
Pre-print
14:30
15m
Talk
Evaluating Garbage Collection Performance Across Managed Language Runtimes
Research Track
Yicheng Wang Institute of Software Chinese Academy of Sciences, Wensheng Dou Institute of Software Chinese Academy of Sciences, Yu Liang Institute of Software Chinese Academy of Sciences, Yi Wang Institute of Software Chinese Academy of Sciences, Wei Wang Institute of Software at Chinese Academy of Sciences, Jun Wei Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Tao Huang Institute of Software Chinese Academy of Sciences
14:45
15m
Talk
Module-Aware Context Sensitive Pointer AnalysisArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Haofeng Li SKLP, Institute of Computing Technology, CAS, Chenghang Shi SKLP, Institute of Computing Technology, CAS, Jie Lu SKLP, Institute of Computing Technology, CAS, Lian Li Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Zixuan Zhao Huawei Technologies Co. Ltd
File Attached
15:00
15m
Talk
An Empirical Study on Reproducible Packaging in Open-Source EcosystemsArtifact-Available
Research Track
Giacomo Benedetti University of Genoa, Oreofe Solarin Case Western Reserve University, Courtney Miller Carnegie Mellon University, Greg Tystahl NCSU, William Enck North Carolina State University, Christian Kästner Carnegie Mellon University, Alexandros Kapravelos NCSU, Alessio Merlo CASD - School of Advanced Defense Studies, Luca Verderame University of Genoa
15:15
15m
Talk
T-Rec: Fine-Grained Language-Agnostic Program Reduction Guided by Lexical Syntax
Journal-first Papers
Zhenyang Xu University of Waterloo, Yongqiang Tian , Mengxiao Zhang , Jiarui Zhang University of Waterloo, Puzhuo Liu Ant Group & Tsinghua University, Yu Jiang Tsinghua University, Chengnian Sun University of Waterloo
14:00 - 15:30
AI for Program Comprehension 2Research Track at 213
Chair(s): Oscar Chaparro William & Mary
14:00
15m
Talk
Code Comment Inconsistency Detection and Rectification Using a Large Language Model
Research Track
Guoping Rong Nanjing University, YongdaYu Nanjing University, Song Liu Nanjing University, Xin Tan Nanjing University, Tianyi Zhang Nanjing University, Haifeng Shen Southern Cross University, Jidong Hu Zhongxing Telecom Equipment
14:15
15m
Talk
Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation
Research Track
Aaron Imani University of California, Irvine, Iftekhar Ahmed University of California at Irvine, Mohammad Moshirpour University of California, Irvine
14:30
15m
Talk
HedgeCode: A Multi-Task Hedging Contrastive Learning Framework for Code Search
Research Track
Gong Chen Wuhan University, Xiaoyuan Xie Wuhan University, Xunzhu Tang University of Luxembourg, Qi Xin Wuhan University, Wenjie Liu Wuhan University
14:45
15m
Talk
Reasoning Runtime Behavior of a Program with LLM: How Far Are We?
Research Track
Junkai Chen Zhejiang University, Zhiyuan Pan Zhejiang University, Xing Hu Zhejiang University, Zhenhao Li York University, Ge Li Peking University, Xin Xia Huawei
15:00
15m
Talk
Source Code Summarization in the Era of Large Language Models
Research Track
Weisong Sun Nanjing University, Yun Miao Nanjing University, Yuekang Li UNSW, Hongyu Zhang Chongqing University, Chunrong Fang Nanjing University, Yi Liu Nanyang Technological University, Gelei Deng Nanyang Technological University, Yang Liu Nanyang Technological University, Zhenyu Chen Nanjing University
Media Attached
15:15
15m
Talk
Template-Guided Program Repair in the Era of Large Language Models
Research Track
Kai Huang , Jian Zhang Nanyang Technological University, Xiangxin Meng Beihang University, Beijing, China, Yang Liu Nanyang Technological University
File Attached
14:00 - 15:30
SE for AI 3Research Track / SE in Society (SEIS) / Journal-first Papers at 215
Chair(s): Lina Marsso École Polytechnique de Montréal
14:00
15m
Talk
Dissecting Global Search: A Simple yet Effective Method to Boost Individual Discrimination Testing and RepairSE for AI
Research Track
Lili Quan Tianjin University, Li Tianlin NTU, Xiaofei Xie Singapore Management University, Zhenpeng Chen Nanyang Technological University, Sen Chen Nankai University, Lingxiao Jiang Singapore Management University, Xiaohong Li Tianjin University
Pre-print
14:15
15m
Talk
FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per ViolationSE for AI
Research Track
Yang Sun Singapore Management University, Chris Poskitt Singapore Management University, Kun Wang Zhejiang University, Jun Sun Singapore Management University
Pre-print File Attached
14:30
15m
Talk
MARQ: Engineering Mission-Critical AI-based Software with Automated Result Quality AdaptationSE for AIArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Uwe Gropengießer Technical University of Darmstadt, Elias Dietz Technical University of Darmstadt, Florian Brandherm Technical University of Darmstadt, Achref Doula Technical University of Darmstadt, Osama Abboud Munich Research Center, Huawei, Xun Xiao Munich Research Center, Huawei, Max Mühlhäuser Technical University of Darmstadt
14:45
15m
Talk
An Empirical Study of Challenges in Machine Learning Asset ManagementSE for AI
Journal-first Papers
Zhimin Zhao Queen's University, Yihao Chen Queen's University, Abdul Ali Bangash Software Analysis and Intelligence Lab (SAIL), Queen's University, Canada, Bram Adams Queen's University, Ahmed E. Hassan Queen’s University
15:00
15m
Talk
A Reference Model for Empirically Comparing LLMs with HumansSE for AI
SE in Society (SEIS)
Kurt Schneider Leibniz Universität Hannover, Software Engineering Group, Farnaz Fotrousi Chalmers University of Technology and University of Gothenburg, Rebekka Wohlrab Chalmers University of Technology
15:15
7m
Talk
Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-PracticeSE for AI
Journal-first Papers
Bentley Oakes Polytechnique Montréal, Michalis Famelis Université de Montréal, Houari Sahraoui DIRO, Université de Montréal
DOI Pre-print File Attached
15:30 - 16:00
15:30
30m
Talk
Mole: Efficient Crash Reproduction in Android Applications With Enforcing Necessary UI Events
Journal-first Papers
Maryam Masoudian Sharif University of Technology, Hong Kong University of Science and Technology (HKUST), Heqing Huang City University of Hong Kong, Morteza Amini Sharif University of Technology, Charles Zhang Hong Kong University of Science and Technology
15:30
30m
Talk
Best ends by the best means: ethical concerns in app reviews
Journal-first Papers
Neelam Tjikhoeri Vrije Universiteit Amsterdam, Lauren Olson Vrije Universiteit Amsterdam, Emitzá Guzmán Vrije Universiteit Amsterdam
15:30
30m
Talk
Shaken, Not Stirred. How Developers Like Their Amplified Tests
Journal-first Papers
Carolin Brandt TU Delft, Ali Khatami Delft University of Technology, Mairieli Wessel Radboud University, Andy Zaidman TU Delft
Pre-print
15:30
30m
Poster
BSan: A Powerful Identifier-Based Hardware-Independent Memory Error Detector for COTS BinariesArtifact-FunctionalArtifact-Available
Research Track
Wen Zhang University of Georgia, Botang Xiao University of Georgia, Qingchen Kong University of Georgia, Le Guan University of Georgia, Wenwen Wang University of Georgia
15:30
30m
Talk
Towards Early Warning and Migration of High-Risk Dormant Open-Source Software DependenciesSecurity
New Ideas and Emerging Results (NIER)
Zijie Huang Shanghai Key Laboratory of Computer Software Testing and Evaluation, Lizhi Cai Shanghai Key Laboratory of Computer Software Testing & Evaluating, Shanghai Software Center, Xuan Mao Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China, Kang Yang Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai Development Center of Computer Software Technology
15:30
30m
Talk
Exploring User Privacy Awareness on GitHub: An Empirical Study
Journal-first Papers
Costanza Alfieri Università degli Studi dell'Aquila, Juri Di Rocco University of L'Aquila, Paola Inverardi Gran Sasso Science Institute, Phuong T. Nguyen University of L’Aquila
15:30
30m
Poster
SimClone: Detecting Tabular Data Clones using Value Similarity
Journal-first Papers
Xu Yang University of Manitoba, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Dayi Lin Centre for Software Excellence, Huawei Canada, Shaowei Wang University of Manitoba, Zhen Ming (Jack) Jiang York University
15:30
30m
Talk
Strategies to Embed Human Values in Mobile Apps: What do End-Users and Practitioners Think?
SE in Society (SEIS)
Rifat Ara Shams CSIRO's Data61, Mojtaba Shahin RMIT University, Gillian Oliver Monash University, Jon Whittle CSIRO's Data61 and Monash University, Waqar Hussain Data61, CSIRO, Harsha Perera CSIRO's Data61, Arif Nurwidyantoro Universitas Gadjah Mada

Fri 2 May

Displayed time zone: Eastern Time (US & Canada) change

10:30 - 11:00
10:30
30m
Talk
An Empirical Study on Developers' Shared Conversations with ChatGPT in GitHub Pull Requests and Issues
Journal-first Papers
Huizi Hao Queen's University, Canada, Kazi Amit Hasan Queen's University, Canada, Hong Qin Queen's University, Marcos Macedo Queen's University, Yuan Tian Queen's University, Kingston, Ontario, Ding Steven, H., H. Queen’s University at Kingston, Ahmed E. Hassan Queen’s University
10:30
30m
Talk
Automating Explanation Need Management in App Reviews: A Case Study from the Navigation App Industry
SE In Practice (SEIP)
Martin Obaidi Leibniz Universität Hannover, Nicolas Voß Graphmasters GmbH, Hannah Deters Leibniz University Hannover, Jakob Droste Leibniz Universität Hannover, Marc Herrmann Leibniz University Hannover, Jannik Fischbach Netlight Consulting GmbH and fortiss GmbH, Kurt Schneider Leibniz Universität Hannover, Software Engineering Group
10:30
30m
Talk
On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools.Security
Journal-first Papers
Aurora Papotti Vrije Universiteit Amsterdam, Ranindya Paramitha University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
10:30
30m
Talk
Relevant information in TDD experiment reporting
Journal-first Papers
Fernando Uyaguari Instituto Superior Tecnológico Wissen, Silvia Teresita Acuña Castillo Universidad Autónoma de Madrid, John W. Castro Universidad de Atacama, Davide Fucci Blekinge Institute of Technology, Oscar Dieste Universidad Politécnica de Madrid, Sira Vegas Universidad Politecnica de Madrid
10:30
30m
Talk
BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks
Research Track
Yisong Xiao Beihang University, Aishan Liu Beihang University; Institute of Dataspace, Xinwei Zhang Beihang University, Tianyuan Zhang Beihang University, Li Tianlin NTU, Siyuan Liang National University of Singapore, Xianglong Liu Beihang University; Institute of Dataspace; Zhongguancun Laboratory, Yang Liu Nanyang Technological University, Dacheng Tao Nanyang Technological University
10:30
30m
Talk
Ethical Issues in Video Games: Insights from Reddit Discussions
SE in Society (SEIS)
Yeqian Li Vrije Universiteit Amsterdam, Kousar Aslam Vrije Universiteit Amsterdam
10:30
30m
Talk
SusDevOps: Promoting Sustainability to a First Principle in Software Delivery
New Ideas and Emerging Results (NIER)
Istvan David McMaster University / McMaster Centre for Software Certification (McSCert)
11:00 - 12:30
Program Comprehension 3Research Track / Journal-first Papers at 204
Chair(s): Arie van Deursen TU Delft
11:00
15m
Talk
Automated Test Generation For Smart Contracts via On-Chain Test Case Augmentation and MigrationBlockchain
Research Track
Jiashuo Zhang Peking University, China, Jiachi Chen Sun Yat-sen University, John Grundy Monash University, Jianbo Gao Peking University, Yanlin Wang Sun Yat-sen University, Ting Chen University of Electronic Science and Technology of China, Zhi Guan Peking University, Zhong Chen
Pre-print
11:15
15m
Talk
Boosting Code-line-level Defect Prediction with Spectrum Information and Causality Analysis
Research Track
Shiyu Sun , Yanhui Li Nanjing University, Lin Chen Nanjing University, Yuming Zhou Nanjing University, Jianhua Zhao Nanjing University, China
11:30
15m
Talk
BatFix: Repairing language model-based transpilation
Journal-first Papers
Daniel Ramos Carnegie Mellon University, Ines Lynce INESC-ID/IST, Universidade de Lisboa, Vasco Manquinho INESC-ID; Universidade de Lisboa, Ruben Martins Carnegie Mellon University, Claire Le Goues Carnegie Mellon University
11:45
15m
Talk
Tracking the Evolution of Static Code Warnings: The State-of-the-Art and a Better Approach
Journal-first Papers
Junjie Li , Jinqiu Yang Concordia University
12:00
15m
Talk
PACE: A Program Analysis Framework for Continuous Performance Prediction
Journal-first Papers
Chidera Biringa University of Massachusetts, Gokhan Kul University of Massachusetts Dartmouth
12:15
15m
Talk
Mimicking Production Behavior With Generated Mocks
Journal-first Papers
Deepika Tiwari KTH Royal Institute of Technology, Martin Monperrus KTH Royal Institute of Technology, Benoit Baudry Université de Montréal
11:00 - 12:30
Testing and QA 4Research Track at 205
Chair(s): Matteo Camilli Politecnico di Milano
11:00
15m
Talk
DPFuzzer: Discovering Safety Critical Vulnerabilities for Drone Path PlannersSecurity
Research Track
Yue Wang , Chao Yang Xidian University, Xiaodong Zhang , Yuwanqi Deng Xidian University, Jianfeng Ma Xidian University
11:15
15m
Talk
IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation
Research Track
Yuyang Rong University of California, Davis, Zhanghan Yu University of California, Davis, Zhenkai Weng University of California, Davis, Stephen Neuendorffer Advanced Micro Devices, Inc., Hao Chen University of California at Davis
11:30
15m
Talk
Ranking Relevant Tests for Order-Dependent Flaky Tests
Research Track
Shanto Rahman The University of Texas at Austin, Bala Naren Chanumolu George Mason University, Suzzana Rafi George Mason University, August Shi The University of Texas at Austin, Wing Lam George Mason University
11:45
15m
Talk
Selecting Initial Seeds for Better JVM Fuzzing
Research Track
Tianchang Gao Tianjin University, Junjie Chen Tianjin University, Dong Wang Tianjin University, Yile Guo College of Intelligence and Computing, Tianjin University, Yingquan Zhao Tianjin University, Zan Wang Tianjin University
12:00
15m
Talk
Toward a Better Understanding of Probabilistic Delta DebuggingArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Mengxiao Zhang , Zhenyang Xu University of Waterloo, Yongqiang Tian , Xinru Cheng University of Waterloo, Chengnian Sun University of Waterloo
12:15
15m
Talk
Tumbling Down the Rabbit Hole: How do Assisting Exploration Strategies Facilitate Grey-box Fuzzing?Award Winner
Research Track
Mingyuan Wu Southern University of Science and Technology, Jiahong Xiang Southern University of Science and Technology, Kunqiu Chen Southern University of Science and Technology, Peng Di Ant Group & UNSW Sydney, Shin Hwei Tan Concordia University, Heming Cui University of Hong Kong, Yuqun Zhang Southern University of Science and Technology
11:00 - 12:30
Human and Social 3SE In Practice (SEIP) / Journal-first Papers / Research Track / New Ideas and Emerging Results (NIER) at 206 plus 208
Chair(s): Yuan Tian Queen's University, Kingston, Ontario
11:00
15m
Talk
Relationship Status: “It’s complicated” Developer-Security Expert Dynamics in ScrumSecurity
Research Track
Houda Naji Ruhr University Bochum, Marco Gutfleisch Ruhr University Bochum, Alena Naiakshina Ruhr University Bochum
11:15
15m
Talk
Soft Skills in Software Engineering: Insights from the Trenches
SE In Practice (SEIP)
Sanna Malinen University of Canterbury, Matthias Galster University of Canterbury, Antonija Mitrovic University of Canterbury, New Zealand, Sreedevi Sankara Iyer University of Canterbury, Pasan Peiris University of Canterbury, New Zealand, April Clarke University of Canterbury
11:30
15m
Talk
A Unified Browser-Based Consent Management Framework
New Ideas and Emerging Results (NIER)
Gayatri Priyadarsini Indian Institute of Technology Gandhinagar, Abhishek Bichhawat Indian Institute of Technology Gandhinagar
11:45
15m
Talk
Predicting Attrition among Software Professionals: Antecedents and Consequences of Burnout and Engagement
Journal-first Papers
Bianca Trinkenreich Colorado State University, Fabio Marcos De Abreu Santos Colorado State University, USA, Klaas-Jan Stol Lero; University College Cork; SINTEF Digital
12:00
7m
Talk
A Controlled Experiment in Age and Gender Bias When Reading Technical Articles in Software Engineering
Journal-first Papers
Anda Liang Vanderbilt University, Emerson Murphy-Hill Microsoft, Westley Weimer University of Michigan, Yu Huang Vanderbilt University
12:07
7m
Talk
Best ends by the best means: ethical concerns in app reviews
Journal-first Papers
Neelam Tjikhoeri Vrije Universiteit Amsterdam, Lauren Olson Vrije Universiteit Amsterdam, Emitzá Guzmán Vrije Universiteit Amsterdam
12:14
7m
Talk
Shaken, Not Stirred. How Developers Like Their Amplified Tests
Journal-first Papers
Carolin Brandt TU Delft, Ali Khatami Delft University of Technology, Mairieli Wessel Radboud University, Andy Zaidman TU Delft
Pre-print
12:21
7m
Talk
Exploring User Privacy Awareness on GitHub: An Empirical Study
Journal-first Papers
Costanza Alfieri Università degli Studi dell'Aquila, Juri Di Rocco University of L'Aquila, Paola Inverardi Gran Sasso Science Institute, Phuong T. Nguyen University of L’Aquila
11:00 - 12:30
Human and Social using AI 2Research Track / SE In Practice (SEIP) / Demonstrations at 207
Chair(s): Sebastian Baltes University of Bayreuth
11:00
15m
Talk
Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation ModelsArtifact-AvailableArtifact-FunctionalArtifact-Reusable
SE In Practice (SEIP)
Hao Li Queen's University, Cor-Paul Bezemer University of Alberta, Ahmed E. Hassan Queen’s University
Pre-print
11:15
15m
Talk
FairLay-ML: Intuitive Debugging of Fairness in Data-Driven Social-Critical Software
Demonstrations
Normen Yu Penn State, Luciana Carreon University of Texas at El Paso, Gang (Gary) Tan Pennsylvania State University, Saeid Tizpaz-Niari University of Illinois Chicago
11:30
15m
Talk
Dear Diary: A randomized controlled trial of Generative AI coding tools in the workplace
SE In Practice (SEIP)
Jenna L. Butler Microsoft Research, Jina Suh Microsoft Research, Sankeerti Haniyur Microsoft Corporation, Constance Hadley Institute for Work Life
11:45
15m
Talk
Exploring GenAI in Software Development: Insights from a Case Study in a Large Brazilian Company
SE In Practice (SEIP)
Guilherme Vaz Pereira School of Technology, PUCRS, Brazil, Victoria Jackson University of California, Irvine, Rafael Prikladnicki School of Technology at PUCRS University, Andre van der Hoek University of California, Irvine, Luciane Fortes Globo, Carolina Araújo Globo, André Coelho Globo, Ligia Chelli Globo, Diego Ramos Globo
Pre-print
12:00
15m
Talk
Human-In-the-Loop Software Development Agents
SE In Practice (SEIP)
Wannita Takerngsaksiri Monash University, Jirat Pasuksmit Atlassian, Patanamon Thongtanunam University of Melbourne, Kla Tantithamthavorn Monash University, Ruixiong Zhang Atlassian, Fan Jiang Atlassian, Jing Li Atlassian, Evan Cook Atlassian, Kun Chen Atlassian, Ming Wu Atlassian
12:15
15m
Talk
Measuring the Runtime Performance of C++ Code Written by Humans using GitHub CopilotArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Daniel Erhabor University of Waterloo, Sreeharsha Udayashankar University of Waterloo, Mei Nagappan University of Waterloo, Samer Al-Kiswany University of Waterloo
DOI Pre-print File Attached
11:00 - 12:30
Security and Analysis 2Research Track at 210
Chair(s): Jordan Samhi University of Luxembourg, Luxembourg
11:00
15m
Talk
A Study of Undefined Behavior Across Foreign Function Boundaries in Rust LibrariesSecurityArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Ian McCormack Carnegie Mellon University, Joshua Sunshine Carnegie Mellon University, Jonathan Aldrich Carnegie Mellon University
Pre-print
11:15
15m
Talk
Cooperative Software Verification via Dynamic Program SplittingSecurityArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Cedric Richter University of Oldenburg, Marek Chalupa Institute of Science and Technology Austria, Marie-Christine Jakobs LMU Munich, Germany, Heike Wehrheim University of Oldenburg
11:30
15m
Talk
Exposing the Hidden Layer: Software Repositories in the Service of SEO ManipulationSecurityArtifact-FunctionalArtifact-Available
Research Track
Mengying Wu Fudan University, Geng Hong Fudan University, Wuyuao Mai Fudan University, Xinyi Wu Fudan University, Lei Zhang Fudan University, Yingyuan Pu QI-ANXIN Technology Research Institute, Huajun Chai QI-ANXIN Technology Research Institute, Lingyun Ying Qi An Xin Group Corp., Haixin Duan Institute for Network Science and Cyberspace, Tsinghua University; Qi An Xin Group Corp., Min Yang Fudan University
11:45
15m
Talk
Hetrify: Efficient Verification of Heterogeneous Programs on RISC-VSecurityAward Winner
Research Track
Yiwei Li School of Computer, National Univer sity of Defense Technology, Liangze Yin School of Computer, National Univer sity of Defense Technology, Wei Dong National University of Defense Technology, Jiaxin Liu National University of Defense Technology, Yanfeng Hu School of Computer, National Univer sity of Defense Technology, Shanshan Li National University of Defense Technology
12:00
15m
Talk
Hyperion: Unveiling DApp Inconsistencies using LLM and Dataflow-Guided Symbolic ExecutionSecurity
Research Track
Shuo Yang Sun Yat-sen University, Xingwei Lin Ant Group, Jiachi Chen Sun Yat-sen University, Qingyuan Zhong Sun Yat-sen University, Lei Xiao Sun Yat-sen University, renke huang Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Zibin Zheng Sun Yat-sen University
12:15
15m
Talk
SmartReco: Detecting Read-Only Reentrancy via Fine-Grained Cross-DApp AnalysisSecurity
Research Track
Jingwen Zhang School of Software Engineering, Sun Yat sen University, Zibin Zheng Sun Yat-sen University, Yuhong Nan Sun Yat-sen University, Mingxi Ye Sun Yat-sen University, Kaiwen Ning Sun Yat-sen University, Yu Zhang Harbin Institute of Technology, Weizhe Zhang Harbin Institute of Technology
11:00 - 12:30
Design and Architecture 1Research Track / SE In Practice (SEIP) / Journal-first Papers at 211
Chair(s): Tushar Sharma Dalhousie University
11:00
15m
Talk
A Catalog of Micro Frontends Anti-patternsArtifact-Available
Research Track
Nabson Silva UFAM - Federal University of Amazonas, Eriky Rodrigues UFAM - Federal University of Amazonas Brazil, Tayana Conte Universidade Federal do Amazonas
11:15
15m
Talk
PairSmell: A Novel Perspective Inspecting Software Modular StructureArtifact-FunctionalArtifact-AvailableAward Winner
Research Track
Chenxing Zhong Nanjing University, Daniel Feitosa University of Groningen, Paris Avgeriou Univ. of Gronningen , Huang Huang State Grid Nanjing Power Supply Company, Yue Li Nanjing University, He Zhang Nanjing University
Pre-print
11:30
15m
Talk
Understanding Architectural Complexity, Maintenance Burden, and Developer Sentiment---a Large-Scale Study
Research Track
Yuanfang Cai Drexel University, Lanting He Google, Yony Kochinski Google, Jun Qian Google, Ciera Jaspan Google, Nan Zhang Google, Antonio Bianco Google
11:45
15m
Talk
A Large-Scale Exploratory Study on the Proxy Pattern in EthereumBlockchain
Journal-first Papers
Amir Ebrahimi Queen's University, Bram Adams Queen's University, Gustavo A. Oliva Queen's University, Ahmed E. Hassan Queen’s University
12:00
15m
Talk
Video Game Procedural Content Generation Through Software Transplantation
SE In Practice (SEIP)
Mar Zamorano López University College London, Daniel Blasco SVIT Research Group. Universidad San Jorge, Carlos Cetina , Federica Sarro University College London
11:00 - 12:30
AI for Analysis 4Research Track / New Ideas and Emerging Results (NIER) / SE In Practice (SEIP) at 212
Chair(s): Maliheh Izadi Delft University of Technology, Ali Al-Kaswan Delft University of Technology, Netherlands, Jonathan Katzy Delft University of Technology
11:00
15m
Talk
RepairAgent: An Autonomous, LLM-Based Agent for Program RepairArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Islem BOUZENIA University of Stuttgart, Prem Devanbu University of California at Davis, Michael Pradel University of Stuttgart
Pre-print
11:15
15m
Talk
Evaluating Agent-based Program Repair at Google
SE In Practice (SEIP)
Patrick Rondon Google, Renyao Wei Google, José Pablo Cambronero Google, USA, Jürgen Cito TU Wien, Aaron Sun Google, Siddhant Sanyam Google, Michele Tufano Google, Satish Chandra Google, Inc
11:30
15m
Talk
Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and DatasetArtifact-AvailableArtifact-FunctionalArtifact-Reusable
SE In Practice (SEIP)
Mohammad Saiful Islam Toronto Metropolitan University, Toronto, Canada, Mohamed Sami Rakha Toronto Metropolitan University, Toronto, Canada, William Pourmajidi Toronto Metropolitan University, Toronto, Canada, Janakan Sivaloganathan Toronto Metropolitan University, Toronto, Canada, John Steinbacher IBM, Andriy Miranskyy Toronto Metropolitan University (formerly Ryerson University)
Pre-print
11:45
15m
Talk
Crash Report Prioritization for Large-Scale Scheduled Launches
SE In Practice (SEIP)
Nimmi Rashinika Weeraddana University of Waterloo, Sarra Habchi Ubisoft Montréal, Shane McIntosh University of Waterloo
12:00
15m
Talk
LogLM: From Task-based to Instruction-based Automated Log Analysis
SE In Practice (SEIP)
Yilun Liu Huawei co. LTD, Yuhe Ji Huawei co. LTD, Shimin Tao University of Science and Technology of China; Huawei co. LTD, Minggui He Huawei co. LTD, Weibin Meng Huawei co. LTD, Shenglin Zhang Nankai University, Yongqian Sun Nankai University, Yuming Xie Huawei co. LTD, Boxing Chen Huawei Canada, Hao Yang Huawei co. LTD
Pre-print
12:15
7m
Talk
Using ML filters to help automated vulnerability repairs: when it helps and when it doesn’tSecurity
New Ideas and Emerging Results (NIER)
Maria Camporese University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
Pre-print
11:00 - 12:30
AI for Testing and QA 5SE In Practice (SEIP) at 214
Chair(s): Chunyang Chen TU Munich
11:00
15m
Talk
ASTER: Natural and Multi-language Unit Test Generation with LLMsAward Winner
SE In Practice (SEIP)
Rangeet Pan IBM Research, Myeongsoo Kim Georgia Institute of Technology, Rahul Krishna IBM Research, Raju Pavuluri IBM T.J. Watson Research Center, Saurabh Sinha IBM Research
Pre-print
11:15
15m
Talk
Automated Code Review In Practice
SE In Practice (SEIP)
Umut Cihan Bilkent University, Vahid Haratian Bilkent Univeristy, Arda İçöz Bilkent University, Mert Kaan Gül Beko, Ömercan Devran Beko, Emircan Furkan Bayendur Beko, Baykal Mehmet Ucar Beko, Eray Tüzün Bilkent University
Pre-print
11:30
15m
Talk
CI at Scale: Lean, Green, and Fast
SE In Practice (SEIP)
Dhruva Juloori Uber Technologies, Inc, Zhongpeng Lin Uber Technologies Inc., Matthew Williams Uber Technologies, Inc, Eddy Shin Uber Technologies, Inc, Sonal Mahajan Uber Technologies Inc.
11:45
15m
Talk
Moving Faster and Reducing Risk: Using LLMs in Release DeploymentAward Winner
SE In Practice (SEIP)
Rui Abreu Meta, Vijayaraghavan Murali Meta Platforms Inc., Peter C Rigby Meta / Concordia University, Chandra Sekhar Maddila Meta Platforms, Inc., Weiyan Sun Meta Platforms, Inc., Jun Ge Meta Platforms, Inc., Kaavya Chinniah Meta Platforms, Inc., Audris Mockus University of Tennessee, Megh Mehta Meta Platforms, Inc., Nachiappan Nagappan Meta Platforms, Inc.
12:00
15m
Talk
Prioritizing Large-scale Natural Language Test Cases at OPPO
SE In Practice (SEIP)
Haoran Xu , Chen Zhi Zhejiang University, Tianyu Xiang Guangdong Oppo Mobile Telecommunications Corp., Ltd., Zixuan Wu Zhejiang University, Gaorong Zhang Zhejiang University, Xinkui Zhao Zhejiang University, Jianwei Yin Zhejiang University, Shuiguang Deng Zhejiang University; Alibaba-Zhejiang University Joint Institute of Frontier Technologies
12:15
15m
Talk
Search+LLM-based Testing for ARM SimulatorsArtifact-AvailableArtifact-FunctionalArtifact-Reusable
SE In Practice (SEIP)
Bobby Bruce University of California at Davis, USA, Aidan Dakhama King's College London, Karine Even-Mendoza King’s College London, William B. Langdon University College London, Hector Menendez King’s College London, Justyna Petke University College London
11:00 - 12:30
SE for AI with Quality 1Research Track at 215
Chair(s): Chris Poskitt Singapore Management University
11:00
15m
Talk
A Tale of Two DL Cities: When Library Tests Meet CompilerSE for AI
Research Track
Qingchao Shen Tianjin University, Yongqiang Tian , Haoyang Ma Hong Kong University of Science and Technology, Junjie Chen Tianjin University, Lili Huang College of Intelligence and Computing, Tianjin University, Ruifeng Fu Tianjin University, Shing-Chi Cheung Hong Kong University of Science and Technology, Zan Wang Tianjin University
11:15
15m
Talk
Iterative Generation of Adversarial Example for Deep Code ModelsSE for AIAward Winner
Research Track
Li Huang , Weifeng Sun , Meng Yan Chongqing University
11:30
15m
Talk
On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning ImplementationsSE for AIArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Rajdeep Singh Hundal National University of Singapore, Yan Xiao Sun Yat-sen University, Xiaochun Cao Sun Yat-Sen University, Jin Song Dong National University of Singapore, Manuel Rigger National University of Singapore
Pre-print Media Attached File Attached
11:45
15m
Talk
µPRL: a Mutation Testing Pipeline for Deep Reinforcement Learning based on Real FaultsSE for AIArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Deepak-George Thomas Tulane University, Matteo Biagiola Università della Svizzera italiana, Nargiz Humbatova Università della Svizzera italiana, Mohammad Wardat Oakland University, USA, Gunel Jahangirova King's College London, Hridesh Rajan Tulane University, Paolo Tonella USI Lugano
Pre-print
12:00
15m
Talk
Testing and Understanding Deviation Behaviors in FHE-hardened Machine Learning ModelsSE for AI
Research Track
Yiteng Peng Hong Kong University of Science and Technology, Daoyuan Wu Hong Kong University of Science and Technology, Zhibo Liu Hong Kong University of Science and Technology, Dongwei Xiao Hong Kong University of Science and Technology, Zhenlan Ji The Hong Kong University of Science and Technology, Juergen Rahmel HSBC, Shuai Wang Hong Kong University of Science and Technology
12:15
15m
Talk
TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron ProvenanceSE for AIArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Waris Gill Virginia Tech, Ali Anwar University of Minnesota, Muhammad Ali Gulzar Virginia Tech
Pre-print
11:00 - 12:30
11:00
15m
Talk
A First Look at Conventional Commits Classification
Research Track
Qunhong Zeng Beijing Institute of Technology, Yuxia Zhang Beijing Institute of Technology, Zhiqing Qiu Beijing Institute of Technology, Hui Liu Beijing Institute of Technology
11:15
15m
Talk
ChatGPT-Based Test Generation for Refactoring Engines Enhanced by Feature Analysis on Examples
Research Track
Chunhao Dong Beijing Institute of Technology, Yanjie Jiang Peking University, Yuxia Zhang Beijing Institute of Technology, Yang Zhang Hebei University of Science and Technology, Hui Liu Beijing Institute of Technology
11:30
15m
Talk
SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing
Research Track
Wenchao Gu The Chinese University of Hong Kong, Ensheng Shi Xi’an Jiaotong University, Yanlin Wang Sun Yat-sen University, Lun Du Microsoft Research, Shi Han Microsoft Research, Hongyu Zhang Chongqing University, Dongmei Zhang Microsoft Research, Michael Lyu The Chinese University of Hong Kong
11:45
15m
Talk
UniGenCoder: Merging Seq2Seq and Seq2Tree Paradigms for Unified Code Generation
New Ideas and Emerging Results (NIER)
Liangying Shao School of Informatics, Xiamen University, China, Yanfu Yan William & Mary, Denys Poshyvanyk William & Mary, Jinsong Su School of Informatics, Xiamen University, China
12:00
15m
Talk
How is Google using AI for internal code migrations?
SE In Practice (SEIP)
Stoyan Nikolov Google, Inc., Daniele Codecasa Google, Inc., Anna Sjovall Google, Inc., Maxim Tabachnyk Google, Siddharth Taneja Google, Inc., Celal Ziftci Google, Satish Chandra Google, Inc
12:15
7m
Talk
LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation
Journal-first Papers
Sarah Fakhoury Microsoft Research, Aaditya Naik University of Pennsylvania, Georgios Sakkas University of California at San Diego, Saikat Chakraborty Microsoft Research, Shuvendu K. Lahiri Microsoft Research
Link to publication
12:22
7m
Talk
The impact of Concept drift and Data leakage on Log Level Prediction Models
Journal-first Papers
Youssef Esseddiq Ouatiti Queen's university, Mohammed Sayagh ETS Montreal, University of Quebec, Noureddine Kerzazi Ensias-Rabat, Bram Adams Queen's University, Ahmed E. Hassan Queen’s University, Youssef Esseddiq Ouatiti Queen's university
13:00 - 13:30
13:00
30m
Talk
Strategies to Embed Human Values in Mobile Apps: What do End-Users and Practitioners Think?
SE in Society (SEIS)
Rifat Ara Shams CSIRO's Data61, Mojtaba Shahin RMIT University, Gillian Oliver Monash University, Jon Whittle CSIRO's Data61 and Monash University, Waqar Hussain Data61, CSIRO, Harsha Perera CSIRO's Data61, Arif Nurwidyantoro Universitas Gadjah Mada
13:00
30m
Talk
Best ends by the best means: ethical concerns in app reviews
Journal-first Papers
Neelam Tjikhoeri Vrije Universiteit Amsterdam, Lauren Olson Vrije Universiteit Amsterdam, Emitzá Guzmán Vrije Universiteit Amsterdam
13:00
30m
Poster
HyperCRX 2.0: A Comprehensive and Automated Tool for Empowering GitHub Insights
Demonstrations
Yantong Wang East China Normal University, Shengyu Zhao Tongji University, will wang , Fenglin Bi East China Normal University
13:00
30m
Poster
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models
Research Track
Kunpeng Zhang The Hong Kong University of Science and Technology, Shuai Wang Hong Kong University of Science and Technology, Jitao Han Central University of Finance and Economics, Xiaogang Zhu The University of Adelaide, Xian Li Swinburne University of Technology, Shaohua Wang Central University of Finance and Economics, Sheng Wen Swinburne University of Technology
13:00
30m
Talk
Using ML filters to help automated vulnerability repairs: when it helps and when it doesn’tSecurity
New Ideas and Emerging Results (NIER)
Maria Camporese University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
Pre-print
13:00
30m
Talk
Shaken, Not Stirred. How Developers Like Their Amplified Tests
Journal-first Papers
Carolin Brandt TU Delft, Ali Khatami Delft University of Technology, Mairieli Wessel Radboud University, Andy Zaidman TU Delft
Pre-print
13:00
30m
Talk
Exploring User Privacy Awareness on GitHub: An Empirical Study
Journal-first Papers
Costanza Alfieri Università degli Studi dell'Aquila, Juri Di Rocco University of L'Aquila, Paola Inverardi Gran Sasso Science Institute, Phuong T. Nguyen University of L’Aquila
14:00 - 15:30
14:00
15m
Talk
Closing the Gap between Sensor Inputs and Driving Properties: A Scene Graph Generator for CARLAArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Demonstrations
Trey Woodlief University of Virginia, Felipe Toledo , Sebastian Elbaum University of Virginia, Matthew B Dwyer University of Virginia
14:15
15m
Talk
LEGOS-SLEEC: Tool for Formalizing and Analyzing Normative RequirementsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Demonstrations
Kevin Kolyakov University of Toronto, Lina Marsso École Polytechnique de Montréal, Nick Feng University of Toronto, Junwei Quan University of Toronto, Marsha Chechik University of Toronto
14:30
15m
Talk
MarMot: Metamorphic Runtime Monitoring of Autonomous Driving Systems
Journal-first Papers
Jon Ayerdi Mondragon University, Asier Iriarte Mondragon University, Pablo Valle Mondragon University, Ibai Roman Mondragon University, Miren Illarramendi Mondragon University, Aitor Arrieta Mondragon University
14:45
15m
Talk
Automatically Generating Content for Testing Autonomous Vehicles from User Descriptions
New Ideas and Emerging Results (NIER)
Benedikt Steininger IMC FH Krems, Chrysanthi Papamichail BeamNG GmbH, David Stark BeamNG GmbH, Dejan Nickovic Austrian Institute of Technology, Alessio Gambi Austrian Institute of Technology (AIT)
15:00
15m
Talk
BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems
SE In Practice (SEIP)
Tao Duan Xi'an Jiaotong University, Runqing Chen Alibaba, Pinghui Wang Xi'an Jiaotong University, Junzhou Zhao Xi'an Jiaotong University, Jiongzhou Liu Alibaba, Shujie Han Northwestern Polytechnical University, Yi Liu Alibaba, Fan Xu Alibaba
15:15
15m
Talk
On Large Language Models in Mission-Critical IT Governance: Are We Ready Yet?Artifact-Available
SE In Practice (SEIP)
Matteo Esposito University of Oulu, Francesco Palagiano Multitel di Lerede Alessandro & C. s.a.s., Valentina Lenarduzzi University of Oulu, Davide Taibi University of Oulu
DOI Pre-print
14:00 - 15:30
Program Comprehension 4Research Track at 204
Chair(s): Simone Scalabrino University of Molise
14:00
15m
Talk
Decoding the Issue Resolution Process In Practice via Issue Report Analysis: A Case Study of FirefoxArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Antu Saha William & Mary, Oscar Chaparro William & Mary
Pre-print
14:15
15m
Talk
Preserving Privacy in Software Composition Analysis: A Study of Technical Solutions and Enhancements
Research Track
Huaijin Wang Ohio State University, Zhibo Liu Hong Kong University of Science and Technology, Yanbo Dai The Hong Kong University of Science and Technology (Guangzhou), Shuai Wang Hong Kong University of Science and Technology, Qiyi Tang Tencent Security Keen Lab, Sen Nie Tencent Security Keen Lab, Shi Wu Tencent Security Keen Lab
14:30
15m
Talk
UML is Back. Or is it? Investigating the Past, Present, and Future of UML in Open Source Software
Research Track
Joseph Romeo Software Institute - USI, Lugano, Switzerland, Marco Raglianti Software Institute - USI, Lugano, Csaba Nagy , Michele Lanza Software Institute - USI, Lugano
Pre-print
14:45
15m
Talk
Understanding the Response to Open-Source Dependency Abandonment in the npm EcosystemAward Winner
Research Track
Courtney Miller Carnegie Mellon University, Mahmoud Jahanshahi University of Tennessee, Audris Mockus University of Tennessee, Bogdan Vasilescu Raj Reddy Associate Professor of Software and Societal Systems, Carnegie Mellon University, USA, Christian Kästner Carnegie Mellon University
15:00
15m
Talk
Understanding Compiler Bugs in Real Development
Research Track
Hao Zhong Shanghai Jiao Tong University
15:15
15m
Talk
Studying Programmers Without Programming: Investigating Expertise Using Resting State fMRI
Research Track
Zachary Karas Vanderbilt University, Benjamin Gold Vanderbilt University, Violet Zhou University of Michigan, Noah Reardon University of Michigan, Thad Polk University of Michigan, Catie Chang Vanderbilt University, Yu Huang Vanderbilt University
14:00 - 15:30
Testing and QA 5Research Track / Journal-first Papers / New Ideas and Emerging Results (NIER) / Demonstrations at 205
Chair(s): Giovanni Denaro University of Milano - Bicocca
14:00
15m
Talk
Leveraging Propagated Infection to Crossfire MutantsArtifact-FunctionalArtifact-Available
Research Track
Hang Du University of California at Irvine, Vijay Krishna Palepu Microsoft, James Jones University of California at Irvine
File Attached
14:15
15m
Talk
IFSE: Taming Closed-box Functions in Symbolic Execution via Fuzz Solving
Demonstrations
Qichang Wang East China Normal University, Chuyang Chen The Ohio State University, Ruiyang Xu East China Normal University, Haiying Sun East China Normal University, Chengcheng Wan East China Normal University, Ting Su East China Normal University, Yueling Zhang East China Normal University, Geguang Pu East China Normal University, China
14:30
15m
Talk
Takuan: Using Dynamic Invariants To Debug Order-Dependent Flaky Tests
New Ideas and Emerging Results (NIER)
Nate Levin Yorktown High School, Chengpeng Li University of Texas at Austin, Yule Zhang George Mason University, August Shi The University of Texas at Austin, Wing Lam George Mason University
14:45
15m
Talk
Vision Transformer Inspired Automated Vulnerability RepairSecurity
Journal-first Papers
Michael Fu The University of Melbourne, Van Nguyen Monash University, Kla Tantithamthavorn Monash University, Dinh Phung Monash University, Australia, Trung Le Monash University, Australia
15:00
15m
Talk
ZigZagFuzz: Interleaved Fuzzing of Program Options and Files
Journal-first Papers
Ahcheong Lee KAIST, Youngseok Choi KAIST, Shin Hong Chungbuk National University, Yunho Kim Hanyang University, Kyutae Cho LIG Nex1 AI R&D, Moonzoo Kim KAIST / VPlusLab Inc.
15:15
15m
Talk
Reducing the Length of Field-replay Based Load Testing
Journal-first Papers
Yuanjie Xia University of Waterloo, Lizhi Liao Memorial University of Newfoundland, Jinfu Chen Wuhan University, Heng Li Polytechnique Montréal, Weiyi Shang University of Waterloo
14:00 - 15:30
Human and Social 4Journal-first Papers / SE in Society (SEIS) / SE In Practice (SEIP) / Research Track at 206 plus 208
Chair(s): Liliana Pasquale University College Dublin & Lero
14:00
15m
Talk
Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products
SE In Practice (SEIP)
Nadia Nahar Carnegie Mellon University, Christian Kästner Carnegie Mellon University, Jenna L. Butler Microsoft Research, Chris Parnin Microsoft, Thomas Zimmermann University of California, Irvine, Christian Bird Microsoft Research
14:15
15m
Talk
Follow-Up Attention: An Empirical Study of Developer and Neural Model Code Exploration
Journal-first Papers
Matteo Paltenghi University of Stuttgart, Rahul Pandita GitHub, Inc., Austin Henley Carnegie Mellon University, Albert Ziegler XBow
14:30
15m
Talk
Do Developers Adopt Green Architectural Tactics for ML-Enabled Systems? A Mining Software Repository StudyArtifact-ReusableArtifact-AvailableArtifact-Functional
SE in Society (SEIS)
Vincenzo De Martino University of Salerno, Silverio Martínez-Fernández UPC-BarcelonaTech, Fabio Palomba University of Salerno
Pre-print
14:45
15m
Talk
Accessibility Issues in Ad-Driven Web ApplicationsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Abdul Haddi Amjad Virginia Tech, Muhammad Danish Virginia Tech, Bless Jah Virginia Tech, Muhammad Ali Gulzar Virginia Tech
15:00
15m
Talk
A Bot-based Approach to Manage Codes of Conduct in Open-Source Projects
SE in Society (SEIS)
Sergio Cobos IN3 - UOC, Javier Luis Cánovas Izquierdo Universitat Oberta de Catalunya
Pre-print
15:15
7m
Talk
Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding WeaknessesSecurity
Journal-first Papers
Wachiraphan (Ping) Charoenwet University of Melbourne, Patanamon Thongtanunam University of Melbourne, Thuan Pham University of Melbourne, Christoph Treude Singapore Management University
14:00 - 15:30
User ExperienceJournal-first Papers / Research Track / SE In Practice (SEIP) / SE in Society (SEIS) at 207
Chair(s): Ramiro Liscano Ontario Tech University
14:00
15m
Talk
A Tale of Two Comprehensions? Analyzing Student Programmer Attention During Code Summarization
Journal-first Papers
Zachary Karas Vanderbilt University, Aakash Bansal University of Notre Dame, Yifan Zhang Vanderbilt University, Toby Jia-Jun Li University of Notre Dame, Collin McMillan University of Notre Dame, Yu Huang Vanderbilt University
14:15
15m
Talk
Asking and Answering Questions During Memory Profiling
Journal-first Papers
Alison Fernandez Blanco University of Chile, Araceli Queirolo Cordova ISCLab, Department of Computer Science (DCC), University of Chile, Alexandre Bergel University of Chile, Juan Pablo Sandoval Alcocer Pontificia Universidad Católica de Chile
14:30
15m
Talk
Unveiling the Energy Vampires: A Methodology for Debugging Software Energy ConsumptionArtifact-FunctionalArtifact-AvailableArtifact-ReusableAward Winner
Research Track
Enrique Barba Roque TU Delft, Luís Cruz TU Delft, Thomas Durieux TU Delft
Pre-print
14:45
15m
Talk
Designing a Tool for Evacuation Plan Validation: Multi-Agent Simulations with Persona-Based UI
SE in Society (SEIS)
Gennaro Zanfardino University of L'Aquila, Antinisca Di Marco University of L'Aquila, Michele Tucci University of L'Aquila
15:00
15m
Talk
Testing False Recalls in E-commerce Apps: a User-perspective Blackbox Approach
SE In Practice (SEIP)
Shengnan Wu School of Computer Science, Fudan University, Yongxiang Hu Fudan University, Jiazhen Gu Fudan University, China, Penglei Mao School of Computer Science, Fudan University, Jin Meng Meituan Inc., Liujie Fan Meituan Inc., Zhongshi Luan Meituan Inc., Xin Wang Fudan University, Yangfan Zhou Fudan University
15:15
7m
Talk
On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools.Security
Journal-first Papers
Aurora Papotti Vrije Universiteit Amsterdam, Ranindya Paramitha University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
15:22
7m
Talk
On Effectiveness and Efficiency of Gamified Exploratory GUI Testing
Journal-first Papers
Riccardo Coppola Politecnico di Torino, Tommaso Fulcini Politecnico di Torino, Luca Ardito Politecnico di Torino, Marco Torchiano Politecnico di Torino, Emil Alégroth Blekinge Institute of Technology
14:00 - 15:30
Security and Analysis 3Research Track / SE In Practice (SEIP) at 210
Chair(s): Adriana Sejfia University of Edinburgh
14:00
15m
Talk
Automated, Unsupervised, and Auto-parameterized Inference of Data Patterns and Anomaly DetectionSecurityArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Qiaolin Qin Polytechnique Montréal, Heng Li Polytechnique Montréal, Ettore Merlo Polytechnique Montreal, Maxime Lamothe Polytechnique Montreal
Pre-print
14:15
15m
Talk
On Prescription or Off Prescription? An Empirical Study of Community-prescribed Security Configurations for KubernetesSecurityArtifact-Available
Research Track
Shazibul Islam Shamim Auburn University, Hanyang Hu Company A, Akond Rahman Auburn University
Pre-print File Attached
14:30
15m
Talk
Similar but Patched Code Considered Harmful -- The Impact of Similar but Patched Code on Recurring Vulnerability Detection and How to Remove ThemSecurity
Research Track
Zixuan Tan Zhejiang University, Jiayuan Zhou Huawei, Xing Hu Zhejiang University, Shengyi Pan Zhejiang University, Kui Liu Huawei, Xin Xia Huawei
Pre-print
14:45
15m
Talk
TIVER: Identifying Adaptive Versions of C/C++ Third-Party Open-Source Components Using a Code Clustering TechniqueSecurityArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Youngjae Choi Korea University, Seunghoon Woo Korea University
15:00
15m
Talk
A scalable, effective and simple Vulnerability Tracking approach for heterogeneous SAST setups based on Scope+OffsetSecurity
SE In Practice (SEIP)
James Johnson --, Julian Thome GitLab Inc., Lucas Charles GitLab Inc., Hua Yan GitLab Inc., Jason Leasure GitLab Inc.
Pre-print
15:15
15m
Talk
''ImmediateShortTerm3MthsAfterThatLOL'': Developer Secure-Coding Sentiment, Practice and Culture in OrganisationsArtifact-AvailableArtifact-FunctionalArtifact-ReusableSecurity
SE In Practice (SEIP)
Ita Ryan University College Cork, Utz Roedig University College Cork, Klaas-Jan Stol Lero; University College Cork; SINTEF Digital
14:00 - 15:30
Design and Architecture 2Journal-first Papers / Research Track at 211
Chair(s): Yuanfang Cai Drexel University, Jan Keim Karlsruhe Institute of Technology (KIT)
14:00
15m
Talk
An Exploratory Study on the Engineering of Security FeaturesSecurityArtifact-FunctionalArtifact-Available
Research Track
Kevin Hermann Ruhr University Bochum, Sven Peldszus Ruhr University Bochum, Jan-Philipp Steghöfer XITASO GmbH IT & Software Solutions, Thorsten Berger Ruhr University Bochum
Pre-print
14:15
15m
Talk
DesignRepair: Dual-Stream Design Guideline-Aware Frontend Repair with Large Language Models
Research Track
Mingyue Yuan The university of new South Wales, Jieshan Chen CSIRO's Data61, Zhenchang Xing CSIRO's Data61, Aaron Quigley CSIRO's Data61, Yuyu Luo HKUST (GZ), Tianqi Luo HKUST (GZ), Gelareh Mohammadi The university of new South Wales, Qinghua Lu Data61, CSIRO, Liming Zhu CSIRO’s Data61
14:30
15m
Talk
Fidelity of Cloud Emulators: The Imitation Game of Testing Cloud-based Software
Research Track
Anna Mazhar Cornell University, Saad Sher Alam University of Illinois Urbana-Champaign, William Zheng University of Illinois Urbana-Champaign, Yinfang Chen University of Illinois at Urbana-Champaign, Suman Nath Microsoft Research, Tianyin Xu University of Illinois at Urbana-Champaign
14:45
15m
Talk
Formally Verified Cloud-Scale AuthorizationAward Winner
Research Track
Aleks Chakarov Amazon Web Services, Jaco Geldenhuys Amazon Web Services, Matthew Heck Amazon Web Services, MIchael Hicks Amazon, Samuel Huang Amazon Web Services, Georges-Axel Jaloyan Amazon Web Services, Anjali Joshi Amazon, K. Rustan M. Leino Amazon, Mikael Mayer Automated Reasoning Group, Amazon Web Services, Sean McLaughlin Amazon Web Services, Akhilesh Mritunjai Amazon.com, Clement Pit-Claudel EPFL, Sorawee Porncharoenwase Amazon Web Services, Florian Rabe Amazon Web Services, Marianna Rapoport Amazon Web Services, Giles Reger Amazon Web Services, Cody Roux Amazon Web Services, Neha Rungta Amazon Web Services, Robin Salkeld Amazon Web Services, Matthias Schlaipfer Amazon Web Services, Daniel Schoepe Amazon, Johanna Schwartzentruber Amazon Web Services, Serdar Tasiran Amazon, n.n., Aaron Tomb Amazon, Emina Torlak Amazon Web Services, USA, Jean-Baptiste Tristan Amazon, Lucas Wagner Amazon Web Services, Michael Whalen Amazon Web Services and the University of Minnesota, Remy Willems Amazon, Tongtong Xiang Amazon Web Services, Taejoon Byun University of Minnesota, Joshua M. Cohen Princeton University, Ruijie Fang University of Texas at Austin, Junyoung Jang McGill University, Jakob Rath TU Wien, Hira Taqdees Syeda , Dominik Wagner University of Oxford, Yongwei Yuan Purdue University
15:00
15m
Talk
The Same Only Different: On Information Modality for Configuration Performance AnalysisArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Hongyuan Liang University of Electronic Science and Technology of China, Yue Huang University of Electronic Science and Technology of China, Tao Chen University of Birmingham
Pre-print
15:15
7m
Talk
Identifying Performance Issues in Cloud Service Systems Based on Relational-Temporal Features
Journal-first Papers
Wenwei Gu The Chinese University of Hong Kong, Jinyang Liu Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Jianping Zhang The Chinese University of Hong Kong, Yuxin Su Sun Yat-sen University, Jiazhen Gu Chinese University of Hong Kong, Cong Feng Huawei Cloud Computing Technology, Zengyin Yang Computing and Networking Innovation Lab, Huawei Cloud Computing Technology Co., Ltd, Yongqiang Yang Huawei Cloud Computing Technology, Michael Lyu The Chinese University of Hong Kong
14:00 - 15:30
AI for Analysis 5Research Track / New Ideas and Emerging Results (NIER) at 212
Chair(s): Tien N. Nguyen University of Texas at Dallas
14:00
15m
Talk
3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers
Research Track
Sarah Fakhoury Microsoft Research, Markus Kuppe Microsoft Research, Shuvendu K. Lahiri Microsoft Research, Tahina Ramananandro Microsoft Research, Nikhil Swamy Microsoft Research
Pre-print
14:15
15m
Talk
Aligning the Objective of LLM-based Program Repair
Research Track
Junjielong Xu The Chinese University of Hong Kong, Shenzhen, Ying Fu Chongqing University, Shin Hwei Tan Concordia University, Pinjia He Chinese University of Hong Kong, Shenzhen
Pre-print
14:30
15m
Talk
Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language ModelsArtifact-Available
Research Track
Aidan Z.H. Yang Carnegie Mellon University, Sophia Kolak Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University, Ruben Martins Carnegie Mellon University, Claire Le Goues Carnegie Mellon University
14:45
15m
Talk
The Fact Selection Problem in LLM-Based Program Repair
Research Track
Nikhil Parasaram Uber Amsterdam, Huijie Yan University College London, Boyu Yang University College London, Zineb Flahy University College London, Abriele Qudsi University College London, Damian Ziaber University College London, Earl T. Barr University College London, Sergey Mechtaev Peking University
15:00
15m
Talk
Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models
Research Track
Zhijie Wang University of Alberta, Zijie Zhou University of Illinois Urbana-Champaign, Da Song University of Alberta, Yuheng Huang University of Alberta, Canada, Shengmai Chen Purdue University, Lei Ma The University of Tokyo & University of Alberta, Tianyi Zhang Purdue University
Pre-print
15:15
15m
Talk
Beyond Syntax: How Do LLMs Understand Code?
New Ideas and Emerging Results (NIER)
Marc North Durham University, Amir Atapour-Abarghouei Durham University, Nelly Bencomo Durham University
14:00 - 15:30
AI for Security 2Research Track at 213
Chair(s): Gias Uddin York University, Canada
14:00
15m
Talk
Repository-Level Graph Representation Learning for Enhanced Security Patch DetectionSecurity
Research Track
Xin-Cheng Wen Harbin Institute of Technology, Zirui Lin Harbin Institute of Technology, Shenzhen, Cuiyun Gao Harbin Institute of Technology, Hongyu Zhang Chongqing University, Yong Wang Anhui Polytechnic University, Qing Liao Harbin Institute of Technology
14:15
15m
Talk
FAMOS: Fault diagnosis for Microservice Systems through Effective Multi-modal Data FusionSecurity
Research Track
Chiming Duan Peking University, Yong Yang Peking University, Tong Jia Institute for Artificial Intelligence, Peking University, Beijing, China, Guiyang Liu Alibaba, Jinbu Liu Alibaba, Huxing Zhang Alibaba Group, Qi Zhou Alibaba, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Gang Huang Peking University
14:30
15m
Talk
Leveraging Large Language Models to Detect npm Malicious PackagesSecurity
Research Track
Nusrat Zahan North Carolina State University, Philipp Burckhardt Socket, Inc, Mikola Lysenko Socket, Inc, Feross Aboukhadijeh Socket, Inc, Laurie Williams North Carolina State University
14:45
15m
Talk
Magika: AI-Powered Content-Type DetectionSecurityArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
15:00
15m
Talk
Closing the Gap: A User Study on the Real-world Usefulness of AI-powered Vulnerability Detection & Repair in the IDESecurityArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Benjamin Steenhoek Microsoft, Siva Sivaraman Microsoft, Renata Saldivar Gonzalez Microsoft, Yevhen Mohylevskyy Microsoft, Roshanak Zilouchian Moghaddam Microsoft, Wei Le Iowa State University
15:15
15m
Talk
Show Me Your Code! Kill Code Poisoning: A Lightweight Method Based on Code NaturalnessSecurity
Research Track
Weisong Sun Nanjing University, Yuchen Chen Nanjing University, Mengzhe Yuan Nanjing University, Chunrong Fang Nanjing University, Zhenpeng Chen Nanyang Technological University, Chong Wang Nanyang Technological University, Yang Liu Nanyang Technological University, Baowen Xu State Key Laboratory for Novel Software Technology, Nanjing University, Zhenyu Chen Nanjing University
Pre-print Media Attached
14:00 - 15:30
AI for Testing and QA 6Journal-first Papers / Research Track / New Ideas and Emerging Results (NIER) at 214
Chair(s): Ladan Tahvildari University of Waterloo
14:00
15m
Talk
Treefix: Enabling Execution with a Tree of PrefixesArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Beatriz Souza Universität Stuttgart, Michael Pradel University of Stuttgart
Pre-print
14:15
15m
Talk
Assessing Evaluation Metrics for Neural Test Oracle Generation
Journal-first Papers
Jiho Shin York University, Hadi Hemmati York University, Moshi Wei York University, Song Wang York University
14:30
15m
Talk
Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement
Journal-first Papers
Saurabhsingh Rajput Dalhousie University, Tim Widmayer University College London (UCL), Ziyuan Shang Nanyang Technological University, Maria Kechagia National and Kapodistrian University of Athens, Federica Sarro University College London, Tushar Sharma Dalhousie University
14:45
15m
Talk
Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality
Journal-first Papers
Hao Li Queen's University, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Cor-Paul Bezemer University of Alberta
Link to publication DOI Pre-print
15:00
15m
Talk
Evaluating the Generalizability of LLMs in Automated Program Repair
New Ideas and Emerging Results (NIER)
Fengjie Li Tianjin University, Jiajun Jiang Tianjin University, Jiajun Sun Tianjin University, Hongyu Zhang Chongqing University
Pre-print
15:15
15m
Talk
How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study
New Ideas and Emerging Results (NIER)
Alejandro Velasco William & Mary, Daniel Rodriguez-Cardenas William & Mary, David Nader Palacio William & Mary, Lutfar Rahman Alif University of Dhaka, Denys Poshyvanyk William & Mary
Pre-print
14:00 - 15:30
SE for AI with Quality 2Journal-first Papers at 215
Chair(s): Romina Spalazzese Malmö University
14:00
15m
Talk
Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning ProjectsSE for AI
Journal-first Papers
Han Wang Monash University, Sijia Yu Jilin University, Chunyang Chen TU Munich, Burak Turhan University of Oulu, Xiaodong Zhu Jilin University
Link to publication DOI Pre-print
14:15
15m
Talk
Boundary State Generation for Testing and Improvement of Autonomous Driving SystemsSE for AI
Journal-first Papers
Matteo Biagiola Università della Svizzera italiana, Paolo Tonella USI Lugano
DOI Pre-print
14:30
15m
Talk
D3: Differential Testing of Distributed Deep Learning with Model GenerationSE for AI
Journal-first Papers
Jiannan Wang Purdue University, Hung Viet Pham York University, Qi Li , Lin Tan Purdue University, Yu Guo Meta Inc., Adnan Aziz Meta Inc., Erik Meijer
14:45
15m
Talk
Evaluating the Impact of Flaky Simulators on Testing Autonomous Driving SystemsSE for AI
Journal-first Papers
Mohammad Hossein Amini University of Ottawa, Shervin Naseri University of Ottawa, Shiva Nejati University of Ottawa
15:00
15m
Talk
Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension StudySE for AI
Journal-first Papers
Luca Giamattei Università di Napoli Federico II, Matteo Biagiola Università della Svizzera italiana, Roberto Pietrantuono Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II, Paolo Tonella USI Lugano
DOI Pre-print
15:15
15m
Talk
Two is Better Than One: Digital Siblings to Improve Autonomous Driving TestingSE for AI
Journal-first Papers
Matteo Biagiola Università della Svizzera italiana, Andrea Stocco Technical University of Munich, fortiss, Vincenzo Riccio University of Udine, Paolo Tonella USI Lugano
DOI Pre-print
16:00 - 17:30
16:00
15m
Talk
Full Line Code Completion: Bringing AI to Desktop
SE In Practice (SEIP)
Anton Semenkin JetBrains, Vitaliy Bibaev JetBrains, Yaroslav Sokolov JetBrains, Kirill Krylov JetBrains, Alexey Kalina JetBrains, Anna Khannanova JetBrains, Danila Savenkov JetBrains, Darya Rovdo JetBrains, Igor Davidenko JetBrains, Kirill Karnaukhov JetBrains, Maxim Vakhrushev JetBrains, Mikhail Kostyukov JetBrains, Mikhail Podvitskii JetBrains, Petr Surkov JetBrains, Yaroslav Golubev JetBrains Research, Nikita Povarov JetBrains, Timofey Bryksin JetBrains Research
Pre-print
16:15
15m
Talk
Automated Accessibility Analysis of Dynamic Content Changes on Mobile Apps
Research Track
Forough Mehralian University of California at Irvine, Ziyao He University of California, Irvine, Sam Malek University of California at Irvine
16:30
15m
Talk
Qualitative Surveys in Software Engineering Research: Definition, Critical Review, and GuidelinesResearch Methods
Journal-first Papers
Jorge Melegati Free University of Bozen-Bolzano, Kieran Conboy University of Galway, Daniel Graziotin University of Hohenheim
Link to publication DOI
16:45
15m
Talk
VulNet: Towards improving vulnerability management in the Maven ecosystemSecurity
Journal-first Papers
Zeyang Ma Concordia University, Shouvick Mondal IIT Gandhinagar, Tse-Hsun (Peter) Chen Concordia University, Haoxiang Zhang Centre for Software Excellence at Huawei Canada, Ahmed E. Hassan Queen’s University, Zeyang Ma Concordia University
17:00
15m
Talk
Energy-Aware Software Testing
New Ideas and Emerging Results (NIER)
Roberto Verdecchia University of Florence, Emilio Cruciani European University of Rome, Antonia Bertolino Gran Sasso Science Institute, Breno Miranda Centro de Informática at Universidade Federal de Pernambuco
Pre-print
17:15
7m
Talk
SusDevOps: Promoting Sustainability to a First Principle in Software Delivery
New Ideas and Emerging Results (NIER)
Istvan David McMaster University / McMaster Centre for Software Certification (McSCert)
16:00 - 17:30
Testing and QA 6Journal-first Papers / Research Track / Demonstrations at 205
Chair(s): Majid Babaei McGill University
16:00
15m
Talk
Characterizing Timeout Builds in Continuous Integration
Journal-first Papers
Nimmi Weeraddana University of Waterloo, Mahmoud Alfadel University of Calgary, Shane McIntosh University of Waterloo
16:15
15m
Talk
GeMTest: A General Metamorphic Testing FrameworkArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Demonstrations
Simon Speth Technical University of Munich, Alexander Pretschner TU Munich
Pre-print
16:30
15m
Talk
Mole: Efficient Crash Reproduction in Android Applications With Enforcing Necessary UI Events
Journal-first Papers
Maryam Masoudian Sharif University of Technology, Hong Kong University of Science and Technology (HKUST), Heqing Huang City University of Hong Kong, Morteza Amini Sharif University of Technology, Charles Zhang Hong Kong University of Science and Technology
16:45
15m
Talk
History-Driven Fuzzing for Deep Learning Libraries
Journal-first Papers
Nima Shiri Harzevili York University, Mohammad Mahdi Mohajer York University, Moshi Wei York University, Hung Viet Pham York University, Song Wang York University
17:00
15m
Talk
Towards a Cognitive Model of Dynamic Debugging: Does Identifier Construction Matter?
Journal-first Papers
Danniell Hu University of Michigan, Priscila Santiesteban University of Michigan, Madeline Endres University of Massachusetts Amherst, Westley Weimer University of Michigan
17:15
15m
Talk
Janus: Detecting Rendering Bugs in Web Browsers via Visual Delta Consistency
Research Track
Chijin Zhou Tsinghua University, Quan Zhang Tsinghua University, Bingzhou Qian National University of Defense Technology, Yu Jiang Tsinghua University
16:00 - 17:30
Human and Social for AIResearch Track / SE in Society (SEIS) / SE In Practice (SEIP) at 206 plus 208
Chair(s): Ramiro Liscano Ontario Tech University
16:00
15m
Talk
ChatGPT Inaccuracy Mitigation during Technical Report Understanding: Are We There Yet?
Research Track
Salma Begum Tamanna University of Calgary, Canada, Gias Uddin York University, Canada, Song Wang York University, Lan Xia IBM, Canada, Longyu Zhang IBM, Canada
16:15
15m
Talk
Navigating the Testing of Evolving Deep Learning Systems: An Exploratory Interview Study
Research Track
Hanmo You Tianjin University, Zan Wang Tianjin University, Bin Lin Hangzhou Dianzi University, Junjie Chen Tianjin University
16:30
15m
Talk
An Empirical Study on Decision-Making Aspects in Responsible Software Engineering for AIArtifact-Available
SE In Practice (SEIP)
Lekshmi Murali Rani Chalmers University of Technology and University of Gothenburg, Sweden, Faezeh Mohammadi Chalmers University of Technology and University of Gothenburg, Sweden, Robert Feldt Chalmers | University of Gothenburg, Richard Berntsson Svensson Chalmers | University of Gothenburg
Pre-print
16:45
15m
Talk
Curious, Critical Thinker, Empathetic, and Ethically Responsible: Essential Soft Skills for Data Scientists in Software Engineering
SE in Society (SEIS)
Matheus de Morais Leça University of Calgary, Ronnie de Souza Santos University of Calgary
17:00
15m
Talk
Multi-Modal LLM-based Fully-Automated Training Dataset Generation Software Platform for Mathematics Education
SE in Society (SEIS)
Minjoo Kim Sookmyung Women's University, Tae-Hyun Kim Sookmyung Women's University, Jaehyun Chung Korea University, Hyunseok Choi Korea University, Seokhyeon Min Korea University, Joon-Ho Lim Tutorus Labs, Soohyun Park Sookmyung Women's University
17:15
15m
Talk
What Does a Software Engineer Look Like? Exploring Societal Stereotypes in LLMs
SE in Society (SEIS)
Muneera Bano CSIRO's Data61, Hashini Gunatilake Monash University, Rashina Hoda Monash University
16:00 - 17:30
Mobile SoftwareResearch Track at 207
Chair(s): Mattia Fazzini University of Minnesota
16:00
15m
Talk
EP-Detector: Automatic Detection of Error-prone Operation Anomalies in Android ApplicationsSecurity
Research Track
Chenkai Guo Nankai University, China, Qianlu Wang College of Cyber Science, Nankai University, Naipeng Dong The University of Queensland, Australia, Lingling Fan Nankai University, Tianhong Wang College of Computer Science, Nankai University, Weijie Zhang College of Computer Science, Nankai University, EnBao Chen College of Cyber Science, Nankai University, Zheli Liu Nankai University, Lu Yu National University of Defense Technology; Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation
16:15
15m
Talk
Mobile Application Coverage: The 30% Curse and Ways ForwardArtifact-Available
Research Track
Faridah Akinotcho University of British Columbia, Canada, Lili Wei McGill University, Julia Rubin The University of British Columbia
Pre-print
16:30
15m
Talk
The Design Smells Breaking the Boundary between Android Variants and AOSP
Research Track
Wuxia Jin Xi'an Jiaotong University, Jiaowei Shang Xi'an Jiaotong University, Jianguo Zheng Xi'an Jiaotong University, Mengjie Sun Xi’an Jiaotong University, Zhenyu Huang Honor Device Co., Ltd., Ming Fan Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University
16:45
15m
Talk
Scenario-Driven and Context-Aware Automated Accessibility Testing for Android Apps
Research Track
Yuxin Zhang Tianjin University, Sen Chen Nankai University, Xiaofei Xie Singapore Management University, Zibo Liu College of Intelligence and Computing, Tianjin University, Lingling Fan Nankai University
17:00
15m
Talk
TacDroid: Detection of Illicit Apps through Hybrid Analysis of UI-based Transition Graphs
Research Track
Yanchen Lu Zhejiang University, Hongyu Lin Zhejiang University, Zehua He Zhejiang University, Haitao Xu Zhejiang University, Zhao Li Hangzhou Yugu Technology, Shuai Hao Old Dominion University, Liu Wang Beijing University of Posts and Telecommunications, Haoyu Wang Huazhong University of Science and Technology, Kui Ren Zhejiang University
17:15
15m
Talk
PacDroid: A Pointer-Analysis-Centric Framework for Security Vulnerabilities in Android AppsSecurityArtifact-FunctionalArtifact-AvailableArtifact-ReusableAward Winner Best Artifact
Research Track
Menglong Chen Nanjing University, Tian Tan Nanjing University, Minxue Pan Nanjing University, Yue Li Nanjing University
16:00 - 17:30
Security and QAResearch Track / Journal-first Papers / SE In Practice (SEIP) at 210
Chair(s): Nafiseh Kahani Carleton University
16:00
15m
Talk
ROSA: Finding Backdoors with FuzzingSecurityArtifact-FunctionalArtifact-AvailableArtifact-ReusableAward Winner Best Artifact
Research Track
Dimitri Kokkonis Université Paris-Saclay, CEA, List, Michaël Marcozzi Université Paris-Saclay, CEA, List, Emilien Decoux Université Paris-Saclay, CEA List, Stefano Zacchiroli LTCI, Télécom Paris, Institut Polytechnique de Paris, Palaiseau, France
Link to publication DOI Pre-print Media Attached File Attached
16:15
15m
Talk
Analyzing the Feasibility of Adopting Google's Nonce-Based CSP Solutions on WebsitesSecurityArtifact-Available
Research Track
Mengxia Ren Colorado School of Mines, Anhao Xiang Colorado School of Mines, Chuan Yue Colorado School of Mines
16:30
15m
Talk
Early Detection of Performance Regressions by Bridging Local Performance Data and Architectural ModelsSecurityAward Winner
Research Track
Lizhi Liao Memorial University of Newfoundland, Simon Eismann University of Würzburg, Heng Li Polytechnique Montréal, Cor-Paul Bezemer University of Alberta, Diego Elias Costa Concordia University, Canada, André van Hoorn University of Hamburg, Germany, Weiyi Shang University of Waterloo
16:45
15m
Talk
Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic DatasetsSecurity
Journal-first Papers
Partha Chakraborty University of Waterloo, Krishna Kanth Arumugam University of Waterloo, Mahmoud Alfadel University of Calgary, Mei Nagappan University of Waterloo, Shane McIntosh University of Waterloo
17:00
15m
Talk
Sunflower: Enhancing Linux Kernel Fuzzing via Exploit-Driven Seed GenerationArtifact-AvailableArtifact-FunctionalArtifact-ReusableSecurity
SE In Practice (SEIP)
Qiang Zhang Hunan University, Yuheng Shen Tsinghua University, Jianzhong Liu Tsinghua University, Yiru Xu Tsinghua University, Heyuan Shi Central South University, Yu Jiang Tsinghua University, Wanli Chang College of Computer Science and Electronic Engineering, Hunan University
17:15
15m
Talk
Practical Object-Level Sanitizer With Aggregated Memory Access and Custom AllocatorSecurity
Research Track
Xiaolei wang National University of Defense Technology, Ruilin Li National University of Defense Technology, Bin Zhang National University of Defense Technology, Chao Feng National University of Defense Technology, Chaojing Tang National University of Defense Technology
16:00 - 17:30
AI for ProcessSE In Practice (SEIP) / Demonstrations / New Ideas and Emerging Results (NIER) at 212
Chair(s): Keheliya Gallaba Centre for Software Excellence, Huawei Canada
16:00
15m
Talk
OptCD: Optimizing Continuous Development
Demonstrations
Talank Baral George Mason University, Emirhan Oğul Middle East Technical University, Shanto Rahman The University of Texas at Austin, August Shi The University of Texas at Austin, Wing Lam George Mason University
16:15
15m
Talk
LLMs as Evaluators: A Novel Approach to Commit Message Quality Assessment
New Ideas and Emerging Results (NIER)
Abhishek Kumar Indian Institute of Technology, Kharagpur, Sandhya Sankar Indian Institute of Technology, Kharagpur, Sonia Haiduc Florida State University, Partha Pratim Das Indian Institute of Technology, Kharagpur, Partha Pratim Chakrabarti Indian Institute of Technology, Kharagpur
16:30
15m
Talk
Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings
SE In Practice (SEIP)
Petr Tsvetkov JetBrains Research, Aleksandra Eliseeva JetBrains Research, Danny Dig University of Colorado Boulder, JetBrains Research, Alexander Bezzubov JetBrains, Yaroslav Golubev JetBrains Research, Timofey Bryksin JetBrains Research, Yaroslav Zharov JetBrains Research
Pre-print
16:45
15m
Talk
Enhancing Differential Testing: LLM-Powered Automation in Release Engineering
SE In Practice (SEIP)
Ajay Krishna Vajjala George Mason University, Arun Krishna Vajjala George Mason University, Carmen Badea Microsoft Research, Christian Bird Microsoft Research, Robert DeLine Microsoft Research, Jason Entenmann Microsoft Research, Nicole Forsgren Microsoft Research, Aliaksandr Hramadski Microsoft, Sandeepan Sanyal Microsoft, Oleg Surmachev Microsoft, Thomas Zimmermann University of California, Irvine, Haris Mohammad Microsoft, Jade D'Souza Microsoft, Mikhail Demyanyuk Microsoft
17:00
15m
Talk
How much does AI impact development speed? An enterprise-based randomized controlled trial
SE In Practice (SEIP)
Elise Paradis Google, Inc, Kate Grey Google, Quinn Madison Google, Daye Nam Google, Andrew Macvean Google, Inc., Nan Zhang Google, Ben Ferrari-Church Google, Satish Chandra Google, Inc
17:15
15m
Talk
Using Reinforcement Learning to Sustain the Performance of Version Control Repositories
New Ideas and Emerging Results (NIER)
Shane McIntosh University of Waterloo, Luca Milanesio GerritForge Inc., Antonio Barone GerritForge Inc., Jacek Centkowski GerritForge Inc., Marcin Czech GerritForge Inc., Fabio Ponciroli GerritForge Inc.
Pre-print
16:00 - 17:30
AI for Security 3Research Track / New Ideas and Emerging Results (NIER) at 213
Chair(s): Tien N. Nguyen University of Texas at Dallas
16:00
15m
Talk
GVI: Guided Vulnerability Imagination for Boosting Deep Vulnerability DetectorsSecurity
Research Track
Heng Yong Nanjing University, Zhong Li , Minxue Pan Nanjing University, Tian Zhang Nanjing University, Jianhua Zhao Nanjing University, China, Xuandong Li Nanjing University
16:15
15m
Talk
Decoding Secret Memorization in Code LLMs Through Token-Level CharacterizationSecurity
Research Track
Yuqing Nie Beijing University of Posts and Telecommunications, Chong Wang Nanyang Technological University, Kailong Wang Huazhong University of Science and Technology, Guoai Xu Harbin Institute of Technology, Shenzhen, Guosheng Xu Key Laboratory of Trustworthy Distributed Computing and Service (MoE), Beijing University of Posts and Telecommunications, Haoyu Wang Huazhong University of Science and Technology
16:30
15m
Talk
Are We Learning the Right Features? A Framework for Evaluating DL-Based Software Vulnerability Detection SolutionsSecurityArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Satyaki Das University of Southern California, Syeda Tasnim Fabiha University of Southern California, Saad Shafiq University of Southern California, Nenad Medvidović University of Southern California
Pre-print Media Attached File Attached
16:45
15m
Talk
Boosting Static Resource Leak Detection via LLM-based Resource-Oriented Intention InferenceSecurity
Research Track
Chong Wang Nanyang Technological University, Jianan Liu Fudan University, Xin Peng Fudan University, Yang Liu Nanyang Technological University, Yiling Lou Fudan University
17:00
15m
Talk
Weakly-supervised Log-based Anomaly Detection with Inexact Labels via Multi-instance LearningSecurity
Research Track
Minghua He Peking University, Tong Jia Institute for Artificial Intelligence, Peking University, Beijing, China, Chiming Duan Peking University, Huaqian Cai Peking University, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Gang Huang Peking University
17:15
7m
Talk
Towards Early Warning and Migration of High-Risk Dormant Open-Source Software DependenciesSecurity
New Ideas and Emerging Results (NIER)
Zijie Huang Shanghai Key Laboratory of Computer Software Testing and Evaluation, Lizhi Cai Shanghai Key Laboratory of Computer Software Testing & Evaluating, Shanghai Software Center, Xuan Mao Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China, Kang Yang Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai Development Center of Computer Software Technology
16:00 - 17:30
Quantum SEJournal-first Papers / Research Track at 214
Chair(s): Dennis Mancl MSWX Software Experts
16:00
15m
Talk
QuanTest: Entanglement-Guided Testing of Quantum Neural Network SystemsQuantum
Journal-first Papers
Jinjing Shi Central South University, Zimeng Xiao Central South University, Heyuan Shi Central South University, Yu Jiang Tsinghua University, Xuelong LI China Telecom
Link to publication
16:15
15m
Talk
Quantum Approximate Optimization Algorithm for Test Case OptimizationQuantum
Journal-first Papers
Xinyi Wang Simula Research Laboratory; University of Oslo, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Tao Yue Beihang University, Paolo Arcaini National Institute of Informatics
16:30
15m
Talk
Testing Multi-Subroutine Quantum Programs: From Unit Testing to Integration TestingQuantum
Journal-first Papers
Peixun Long Institute of High Energy Physics, Chinese Academy of Science, Jianjun Zhao Kyushu University
16:45
15m
Talk
Mitigating Noise in Quantum Software Testing Using Machine LearningQuantum
Journal-first Papers
Asmar Muqeet Simula Research Laboratory and University of Oslo, Tao Yue Beihang University, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Paolo Arcaini National Institute of Informatics , Asmar Muqeet Simula Research Laboratory and University of Oslo
17:00
15m
Talk
Test Case Minimization with Quantum AnnealingQuantum
Journal-first Papers
Xinyi Wang Simula Research Laboratory; University of Oslo, Asmar Muqeet Simula Research Laboratory and University of Oslo, Tao Yue Beihang University, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Paolo Arcaini National Institute of Informatics
17:15
7m
Talk
When Quantum Meets Classical: Characterizing Hybrid Quantum-Classical Issues Discussed in Developer ForumsQuantumArtifact-Available
Research Track
Jake Zappin William and Mary, Trevor Stalnaker William & Mary, Oscar Chaparro William & Mary, Denys Poshyvanyk William & Mary
16:00 - 17:30
SE for AI with Quality 3Research Track / SE In Practice (SEIP) at 215
Chair(s): Sumon Biswas Case Western Reserve University
16:00
15m
Talk
Improved Detection and Diagnosis of Faults in Deep Neural Networks Using Hierarchical and Explainable ClassificationSE for AIArtifact-Available
Research Track
Sigma Jahan Dalhousie University, Mehil Shah Dalhousie University, Parvez Mahbub Dalhousie University, Masud Rahman Dalhousie University
Pre-print
16:15
15m
Talk
Lightweight Concolic Testing via Path-Condition Synthesis for Deep Learning LibrariesSE for AIArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
16:30
15m
Talk
Mock Deep Testing: Toward Separate Development of Data and Models for Deep LearningSE for AI
Research Track
Ruchira Manke Tulane University, USA, Mohammad Wardat Oakland University, USA, Foutse Khomh Polytechnique Montréal, Hridesh Rajan Tulane University
16:45
15m
Talk
RUG: Turbo LLM for Rust Unit Test GenerationSE for AI
Research Track
Xiang Cheng Georgia Institute of Technology, Fan Sang Georgia Institute of Technology, Yizhuo Zhai Georgia Institute of Technology, Xiaokuan Zhang George Mason University, Taesoo Kim Georgia Institute of Technology
Pre-print Media Attached File Attached
17:00
15m
Talk
Test Input Validation for Vision-based DL Systems: An Active Learning ApproachArtifact-AvailableArtifact-FunctionalArtifact-ReusableSE for AI
SE In Practice (SEIP)
Delaram Ghobari University of Ottawa, Mohammad Hossein Amini University of Ottawa, Dai Quoc Tran SmartInsideAI Company Ltd. and Sungkyunkwan University, Seunghee Park SmartInsideAI Company Ltd. and Sungkyunkwan University, Shiva Nejati University of Ottawa, Mehrdad Sabetzadeh University of Ottawa
Pre-print
17:15
15m
Talk
SEMANTIC CODE FINDER: An Efficient Semantic Search Framework for Large-Scale Codebases
SE In Practice (SEIP)
daeha ryu Innovation Center, Samsung Electronics, Seokjun Ko Samsung Electronics Co., Eunbi Jang Innovation Center, Samsung Electronics, jinyoung park Innovation Center, Samsung Electronics, myunggwan kim Innovation Center, Samsung Electronics, changseo park Innovation Center, Samsung Electronics
16:00 - 17:30
BlockchainResearch Track at Canada Hall 1 and 2
Chair(s): Daniel Amyot University of Ottawa
16:00
15m
Talk
An Empirical Study of Proxy Smart Contracts at the Ethereum Ecosystem ScaleBlockchainArtifact-Available
Research Track
Mengya Zhang The Ohio State University, Preksha Shukla George Mason University, Wuqi Zhang Mega Labs, Zhuo Zhang Purdue University, Pranav Agrawal George Mason University, Zhiqiang Lin The Ohio State University, Xiangyu Zhang Purdue University, Xiaokuan Zhang George Mason University
16:15
15m
Talk
Demystifying and Detecting Cryptographic Defects in Ethereum Smart ContractsBlockchainAward Winner
Research Track
Jiashuo Zhang Peking University, China, Yiming Shen Sun Yat-sen University, Jiachi Chen Sun Yat-sen University, Jianzhong Su Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Ting Chen University of Electronic Science and Technology of China, Jianbo Gao Peking University, Zhong Chen
16:30
15m
Talk
Chord: Towards a Unified Detection of Blockchain Transaction Parallelism BugsBlockchain
Research Track
Yuanhang Zhou Tsinghua University, Zhen Yan Tsinghua University, Yuanliang Chen Tsinghua University, Fuchen Ma Tsinghua University, Ting Chen University of Electronic Science and Technology of China, Yu Jiang Tsinghua University
16:45
15m
Talk
Definition and Detection of Centralization Defects in Smart ContractsBlockchain
Research Track
Zewei Lin Sun Yat-sen University, Jiachi Chen Sun Yat-sen University, Jiajing Wu Sun Yat-sen University, Weizhe Zhang Harbin Institute of Technology, Zibin Zheng Sun Yat-sen University
17:00
15m
Talk
Fork State-Aware Differential Fuzzing for Blockchain Consensus ImplementationsBlockchainArtifact-FunctionalArtifact-Available
Research Track
Won Hoi Kim KAIST, Hocheol Nam KAIST, Muoi Tran ETH Zurich, Amin Jalilov KAIST, Zhenkai Liang National University of Singapore, Sang Kil Cha KAIST, Min Suk Kang KAIST
DOI Pre-print
17:15
15m
Talk
Code Cloning in Solidity Smart Contracts: Prevalence, Evolution, and Impact on DevelopmentBlockchain
Research Track
Ran Mo Central China Normal University, Haopeng Song Central China Normal University, Wei Ding Central China Normal University, Chaochao Wu Central China Normal University

The following papers published in journals have been accepted to be presented in the ICSE 2025 Journal First Track, subject to an author registering to attend the conference.

Hao Li, Gopi Krishnan Rajbahadur, Cor-Paul Bezemer, "Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality"

Abstract: Bindings for machine learning frameworks (such as TensorFlow and PyTorch) allow developers to integrate a framework’s functionality using a programming language different from the framework’s default language (usually Python). In this paper, we study the impact of using TensorFlow and PyTorch bindings in C#, Rust, Python and JavaScript on the software quality in terms of correctness (training and test accuracy) and time cost (training and inference time) when training and performing inference on five widely used deep learning models. Our experiments show that a model can be trained in one binding and used for inference in another binding for the same framework without losing accuracy. Our study is the first to show that using a non-default binding can help improve machine learning software quality from the time cost perspective compared to the default Python binding while still achieving the same level of correctness.

 Tags: "Testing and Quality", "AI for SE"  
 
Jaeseong Lee, Simin Chen, Austin Mordahl, Cong Liu, Wei Yang, Shiyi Wei, "Automated Testing Linguistic Capabilities of NLP Models"

Abstract: Natural language processing (NLP) has gained widespread adoption in the development of real-world applications. However, the black-box nature of neural networks in NLP applications poses a challenge when evaluating their performance, let alone ensuring it. Recent research has proposed testing techniques to enhance the trustworthiness of NLP-based applications. However, most existing works use a single, aggregated metric (i.e., accuracy) which is difficult for users to assess NLP model performance on fine-grained aspects, such as LCs. To address this limitation, we present ALiCT, an automated testing technique for validating NLP applications based on their LCs. ALiCT takes user-specified LCs as inputs and produces diverse test suite with test oracles for each of given LC. We evaluate ALiCT on two widely adopted NLP tasks, sentiment analysis and hate speech detection, in terms of diversity, effectiveness, and consistency. Using Self-BLEU and syntactic diversity metrics, our findings reveal that ALiCT generates test cases that are 190% and 2213% more diverse in semantics and syntax, respectively, compared to those produced by state-of-the-art techniques. In addition, ALiCT is capable of producing a larger number of NLP model failures in 22 out of 25 LCs over the two NLP applications.

 Tags: "Formal methods", "Testing and Quality"  
 
SayedHassan Khatoonabadi, Ahmad Abdellatif, Diego Elias Costa, Emad Shihab, "Predicting the First Response Latency of Maintainers and Contributors in Pull Requests"

Abstract: The success of a Pull Request (PR) depends on the responsiveness of the maintainers and the contributor during the review process. Being aware of the expected waiting times can lead to better interactions and managed expectations for both the maintainers and the contributor. In this paper, we propose a machine-learning approach to predict the first response latency of the maintainers following the submission of a PR, and the first response latency of the contributor after receiving the first response from the maintainers. We curate a dataset of 20 large and popular open-source projects on GitHub and extract 21 features to characterize projects, contributors, PRs, and review processes. Using these features, we then evaluate seven types of classifiers to identify the best-performing models. We also conduct permutation feature importance and SHAP analyses to understand the importance and the impact of different features on the predicted response latencies. We find that our CatBoost models are the most effective for predicting the first response latencies of both maintainers and contributors. Compared to a dummy classifier that always returns the majority class, these models achieved an average improvement of 29% in AUC-ROC and 51% in AUC-PR for maintainers, as well as 39% in AUC-ROC and 89% in AUC-PR for contributors across the studied projects. The results indicate that our models can aptly predict the first response latencies using the selected features. We also observe that PRs submitted earlier in the week, containing an average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses. Moreover, we show the effectiveness of our approach in a cross-project setting. Finally, we discuss key guidelines for maintainers, contributors, and researchers to help facilitate the PR review process.

 Tags: "Prog Comprehension/Reeng/Maint", "AI for SE", "Open Source"  
 
Zachary Karas, Aakash Bansal, Yifan Zhang, Toby Jia-Jun Li, Collin McMillan, Yu Huang, "A Tale of Two Comprehensions? Analyzing Student Programmer Attention During Code Summarization"

Abstract: Code summarization is the task of creating short, natural language descriptions of source code. It is an important part of code comprehension and a powerful method of documentation. Previous work has made progress in identifying where programmers focus in code as they write their own summaries (i.e., Writing). However, there is currently a gap in studying programmers’ attention as they read code with pre-written summaries (i.e., Reading). As a result, it is currently unknown how these two forms of code comprehension compare: Reading and Writing. Also, there is a limited understanding of programmer attention with respect to program semantics. We address these shortcomings with a human eye-tracking study (n = 27) comparing Reading and Writing. We examined programmers’ attention with respect to fine-grained program semantics, including their attention sequences (i.e., scan paths). We find distinctions in programmer attention across the comprehension tasks, similarities in reading patterns between them, and differences mediated by demographic factors. This can help guide code comprehension in both computer science education and automated code summarization. Furthermore, we mapped programmers’ gaze data onto the Abstract Syntax Tree to explore another representation of human attention. We find that visual behavior on this structure is not always consistent with that on source code.

 Tags: "User experience", "Education"  
 
Miguel Setúbal, Tayana Conte, Marcos Kalinowski, Allysson Allex Araújo, "Investigating the Online Recruitment and Selection Journey of Novice Software Engineers: Anti-patterns and Recommendations"

Abstract: The growing software development market has increased the demand for qualified professionals in Software Engineering (SE). To this end, companies must enhance their Recruitment and Selection (R&S) processes to maintain high-quality teams, including opening opportunities for beginners, such as trainees and interns. However, given the various judgments and sociotechnical factors involved, this complex process of R&S poses a challenge for recent graduates seeking to enter the market. This paper aims to identify a set of anti-patterns and recommendations for early career SE professionals concerning R&S processes. Under an exploratory and qualitative methodological approach, we conducted six online Focus Groups with 18 recruiters with experience in R&S in the software industry. After completing our qualitative analysis, we identified 12 anti-patterns and 31 actionable recommendations regarding the hiring process focused on entry-level SE professionals. The identified anti-patterns encompass behavioral and technical dimensions innate to R&S processes. These findings provide a rich opportunity for reflection in the SE industry and offer valuable guidance for early-career candidates and organizations. From an academic perspective, this work also raises awareness of the intersection of Human Resources and SE, an area with considerable potential to be expanded in the context of cooperative and human aspects of SE.

 Tags: "Human/Social"  
 
Wenwei Gu, Jinyang Liu, Zhuangbin Chen, Jianping Zhang, Yuxin Su, Jiazhen Gu, Cong Feng, Zengyin Yang, Yongqiang Yang, Michael Lyu, "Identifying Performance Issues in Cloud Service Systems Based on Relational-Temporal Features"

Abstract: Cloud systems, typically comprised of various components (e.g., microservices), are susceptible to performance issues, which may cause service-level agreement violations and financial losses. Identifying performance issues is thus of paramount importance for cloud vendors. In current practice, crucial metrics, i.e., key performance indicators (KPIs), are monitored periodically to provide insight into the operational status of components. Identifying performance issues is often formulated as an anomaly detection problem, which is tackled by analyzing each metric independently. However, this approach overlooks the complex dependencies existing among cloud components. Some graph neural network-based methods take both temporal and relational information into account, however, the correlation violations in the metrics that serve as indicators of underlying performance issues are difficult for them to identify. Furthermore, a large volume of components in a cloud system results in a vast array of noisy metrics. This complexity renders it impractical for engineers to fully comprehend the correlations, making it challenging to identify performance issues accurately. To address these limitations, we propose Identifying Performance Issues based on Relational-Temporal Features (ISOLATE ), a learning-based approach that leverages both the relational and temporal features of metrics to identify performance issues. In particular, it adopts a graph neural network with attention to characterizing the relations among metrics and extracts long-term and multi-scale temporal patterns using a GRU and a convolution network, respectively. The learned graph attention weights can be further used to localize the correlation-violated metrics. Moreover, to relieve the impact of noisy data, ISOLATE utilizes a positive unlabeled learning strategy that tags pseudo labels based on a small portion of confirmed negative examples. Extensive evaluation on both public and industrial datasets shows that ISOLATE outperforms all baseline models with 0.945 F1-score and 0.920 Hit rate@3. The ablation study also proves the effectiveness of the relational-temporal features and the PU-learning strategy. Furthermore, we share the success stories of leveraging ISOLATE to identify performance issues in Huawei Cloud, which demonstrates its superiority in practice.

 Tags: "Testing and Quality", "Design/Architecture"  
 
Saurabhsingh Rajput, Tim Widmayer, Ziyuan Shang, Maria Kechagia, Federica Sarro, Tushar Sharma, "Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement"

Abstract: With the increasing usage, scale, and complexity of Deep Learning (DL) models, their rapidly growing energy consumption has become a critical concern. Promoting green development and energy awareness at different granularities is the need of the hour to limit carbon emissions of dl systems. However, the lack of standard and repeatable tools to accurately measure and optimize energy consumption at fine granularity (e.g., at the API level) hinders progress in this area. This paper introduces FECoM (Fine-grained Energy Consumption Meter), a framework for fine-grained DL energy consumption measurement. FECoM enables researchers and developers to profile DL APIS from energy perspective. FECoM addresses the challenges of fine-grained energy measurement using static instrumentation while considering factors such as computational load and temperature stability. We assess FECoM’s capability for fine-grained energy measurement for one of the most popular open-source DL frameworks, namely TENSORFLOW. Using FECoM, we also investigate the impact of parameter size and execution time on energy consumption, enriching our understanding of TENSORFLOW APIS’ energy profiles. Furthermore, we elaborate on the considerations and challenges while designing and implementing a fine-grained energy measurement tool. This work will facilitate further advances in dl energy measurement and the development of energy-aware practices for DL systems.

 Tags: "Testing and Quality", "AI for SE"  
 
Emanuela Guglielmi, Gabriele Bavota, Rocco Oliveto, Simone Scalabrino, "Automatic Identification of Game Stuttering via Gameplay Videos Analysis"

Abstract: Modern video games are extremely complex software systems and, as such, they might suffer from several types of post-release issues. A particularly insidious issue is constituted by drops in the frame rate (i.e., stuttering events), which might have a negative impact on the user experience. Stuttering events are frequently documented in the million of hours of gameplay videos shared by players on platforms such as Twitch or YouTube. From the developers’ perspective, these videos represent a free source of documented “testing activities”. However, especially for popular games, the quantity and length of these videos make impractical their manual inspection. We introduce HASTE, an approach for the automatic detection of stuttering events in gameplay videos that can be exploited to generate candidate bug reports. HASTE firstly splits a given video into visually coherent slices, with the goal of filtering-out those that not representing actual gameplay (e.g., navigating the game settings). Then, it identifies the subset of pixels in the video frames which actually show the game in action excluding additional elements on screen such as the logo of the YouTube channel, on-screen chats etc. In this way, HASTE can exploit state-of-the-art image similarity metrics to identify candidate stuttering events, namely subsequent frames being almost identical in the pixels depicting the game. We evaluate the different steps behind HASTE on a total of 105 videos showing that it can correctly extract video slices with a 76% precision, and can correctly identify the slices related to gameplay with a recall and precision higher than 77%. Overall, HASTE achieves 71% recall and 89% precision for the identification of stuttering events in gameplay videos.

 Tags: "Analysis", "Testing and Quality", "Games"  
 
Jiho Shin, Hadi Hemmati, Moshi Wei, Song Wang, "Assessing Evaluation Metrics for Neural Test Oracle Generation"

Abstract: Recently, deep learning models have shown promising results in test oracle generation. Neural Oracle Generation (NOG) models are commonly evaluated using static (automatic) metrics which are mainly based on textual similarity of the output, e.g. BLEU, ROUGE-L, METEOR, and Accuracy. However, these textual similarity metrics may not reflect the testing effectiveness of the generated oracle within a test suite, which is often measured by dynamic (execution-based) test adequacy metrics such as code coverage and mutation score. In this work, we revisit existing oracle generation studies plus gpt-3.5 to empirically investigate the current standing of their performance in textual similarity and test adequacy metrics. Specifically, we train and run four state-of-the-art test oracle generation models on seven textual similarity and two test adequacy metrics for our analysis. We apply two different correlation analyses between these two different sets of metrics. Surprisingly, we found no significant correlation between the textual similarity metrics and test adequacy metrics. For instance, gpt-3.5 on the jackrabbit-oak project had the highest performance on all seven textual similarity metrics among the studied NOGs. However, it had the lowest test adequacy metrics compared to all the studied NOGs. We further conducted a qualitative analysis to explore the reasons behind our observations. We found that oracles with high textual similarity metrics but low test adequacy metrics tend to have complex or multiple chained method invocations within the oracle's parameters, making them hard for the model to generate completely, affecting the test adequacy metrics. On the other hand, oracles with low textual similarity metrics but high test adequacy metrics tend to have to call different assertion types or a different method that functions similarly to the ones in the ground truth. Overall, this work complements prior studies on test oracle generation with an extensive performance evaluation on textual similarity and test adequacy metrics and provides guidelines for better assessment of deep learning applications in software test generation in the future.

 Tags: "Testing and Quality", "AI for SE"  
 
Zhe Yu, Joymallya Chakraborty, Tim Menzies, "FairBalance: How to Achieve Equalized Odds With Data Pre-processing"

Abstract: This research seeks to benefit the software engineering society by providing a simple yet effective pre-processing approach to achieve equalized odds fairness in machine learning software. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. It is the responsibility of all software developers to make their software accountable by ensuring that the machine learning software do not perform differently on different sensitive demographic groups—satisfying equalized odds. Different from prior works which either optimize for an equalized odds related metric during the learning process like a black-box, or manipulate the training data following some intuition; this work studies the root cause of the violation of equalized odds and how to tackle it. We found that equalizing the class distribution in each demographic group with sample weights is a necessary condition for achieving equalized odds without modifying the normal training process. In addition, an important partial condition for equalized odds (zero average odds difference) can be guaranteed when the class distributions are weighted to be not only equal but also balanced (1:1). Based on these analyses, we proposed FairBalance, a pre-processing algorithm which balances the class distribution in each demographic group by assigning calculated weights to the training data. On eight real-world datasets, our empirical results show that, at low computational overhead, the proposed pre-processing algorithm FairBalance can significantly improve equalized odds without much, if any damage to the utility. FairBalance also outperforms existing state-of-the-art approaches in terms of equalized odds. To facilitate reuse, reproduction, and validation, we made our scripts available at https://github.com/hil-se/FairBalance .

 Tags: "Testing and Quality", "AI for SE"  
 
Jon Ayerdi, Asier Iriarte, Pablo Valle, Ibai Roman, Miren Illarramendi, Aitor Arrieta, "MarMot: Metamorphic Runtime Monitoring of Autonomous Driving Systems"

Abstract: Autonomous Driving Systems (ADSs) are complex Cyber-Physical Systems (CPSs) that must ensure safety even in uncertain conditions. Modern ADSs often employ Deep Neural Networks (DNNs), which may not produce correct results in every possible driving scenario. Thus, an approach to estimate the confidence of an ADS at runtime is necessary to prevent potentially dangerous situations. In this paper we propose MarMot, an online monitoring approach for ADSs based on Metamorphic Relations (MRs), which are properties of a system that hold among multiple inputs and the corresponding outputs. Using domain-specific MRs, MarMot estimates the uncertainty of the ADS at runtime, allowing the identification of anomalous situations that are likely to cause a faulty behavior of the ADS, such as driving off the road. We perform an empirical assessment of MarMot with five different MRs, using two different subject ADSs, including a small-scale physical ADS and a simulated ADS. Our evaluation encompasses the identification of both external anomalies, e.g., fog, as well as internal anomalies, e.g., faulty DNNs due to mislabeled training data. Our results show that MarMot can identify up to 65% of the external anomalies and 100% of the internal anomalies in the physical ADS, and up to 54% of the external anomalies and 88% of the internal anomalies in the simulated ADS. With these results, MarMot outperforms or is comparable to other state-of-the-art approaches, including SelfOracle, Ensemble, and MC Dropout-based ADS monitors.

 Tags: "Real-Time", "Analysis"  
 
Baharin A. Jodat, Abhishek Chandar, Shiva Nejati, Mehrdad Sabetzadeh, "Test Generation Strategies for Building Failure Models and Explaining Spurious Failures"

Abstract: Test inputs fail not only when the system under test is faulty but also when the inputs are invalid or unrealistic. Failures resulting from invalid or unrealistic test inputs are spurious. Avoiding spurious failures improves the effectiveness of testing in exercising the main functions of a system, particularly for compute-intensive (CI) systems where a single test execution takes significant time. In this article, we propose to build failure models for inferring interpretable rules on test inputs that cause spurious failures. We examine two alternative strategies for building failure models: (1) machine learning (ML)-guided test generation and (2) surrogate-assisted test generation. ML-guided test generation infers boundary regions that separate passing and failing test inputs and samples test inputs from those regions. Surrogate-assisted test generation relies on surrogate models to predict labels for test inputs instead of exercising all the inputs. We propose a novel surrogate-assisted algorithm that uses multiple surrogate models simultaneously, and dynamically selects the prediction from the most accurate model. We empirically evaluate the accuracy of failure models inferred based on surrogate-assisted and ML-guided test generation algorithms. Using case studies from the domains of cyber-physical systems and networks, we show that our proposed surrogate-assisted approach generates failure models with an average accuracy of 83%, significantly outperforming ML-guided test generation and two baselines. Further, our approach learns failure-inducing rules that identify genuine spurious failures as validated against domain knowledge.

 Tags: "Testing and Quality", "AI for SE"  
 
Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, Christoph Treude, "Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding Weaknesses"

Abstract: Identifying security issues early is encouraged to reduce the latent negative impacts on the software systems. Code review is a widely-used method that allows developers to manually inspect modified code, catching security issues during a software development cycle. However, existing code review studies often focus on known vulnerabilities, neglecting coding weaknesses, which can introduce real-world security issues that are more visible through code review. The practices of code reviews in identifying such coding weaknesses are not yet fully investigated. To better understand this, we conducted an empirical case study in two large open-source projects, OpenSSL and PHP. Based on 135,560 code review comments, we found that reviewers raised security concerns in 35 out of 40 coding weakness categories. Surprisingly, some coding weaknesses related to past vulnerabilities, such as memory errors and resource management, were discussed less often than the vulnerabilities. Developers attempted to address raised security concerns in many cases (39%-41%), but a substantial portion was merely acknowledged (30%-36%), and some went unfixed due to disagreements about solutions (18%-20%). This highlights that coding weaknesses can slip through code review even when identified. Our findings suggest that reviewers can identify various coding weaknesses leading to security issues during code reviews. However, these results also reveal shortcomings in current code review practices, indicating the need for more effective mechanisms or support for increasing awareness of security issue management in code reviews.

 Tags: "Security", "User experience"  
 
Fang Liu, Zhiyi Fu, Ge Li, Zhi Jin, Hui Liu, Yiyang Hao, Li Zhang, "Non-Autoregressive Line-Level Code Completion"

Abstract: Software developers frequently use code completion tools to accelerate software development by suggesting the following code elements. Researchers usually employ AutoRegressive (AR) decoders to complete code sequences in a left-to-right, token-by-token fashion. To improve the accuracy and efficiency of code completion, we argue that tokens within a code statement have the potential to be predicted concurrently. In this article, we first conduct an empirical study to analyze the dependency among the target tokens in line-level code completion. The results suggest that it is potentially practical to generate all statement tokens in parallel. To this end, we introduce SANAR, a simple and effective syntax-aware non-autoregressive model for line-level code completion. To further improve the quality of the generated code, we propose an adaptive and syntax-aware sampling strategy to boost the model’s performance. The experimental results obtained from two widely used datasets indicate that our model outperforms state-of-the-art code completion approaches of similar model size by a considerable margin, and is faster than these models with up to 9× speed-up. Moreover, the extensive results additionally demonstrate that the enhancements achieved by SANAR become even more pronounced with larger model sizes, highlighting their significance.

 Tags: "IDEs", "AI for SE"  
 
Huizi Hao, Kazi Amit Hasan, Hong Qin, Marcos Macedo, Yuan Tian, Steven Ding, Ahmed E. Hassan, "An Empirical Study on Developers’ Shared Conversations with ChatGPT in GitHub Pull Requests and Issues"

Abstract: ChatGPT has significantly impacted software development practices, providing substantial assistance to developers in various tasks, including coding, testing, and debugging. Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored. In this paper, we analyze a dataset of 210 and 370 developers’ shared conversations with ChatGPT in GitHub pull requests (PRs) and issues. We manually examined the content of the conversations and characterized the dynamics of the sharing behavior, i.e., understanding the rationale behind the sharing, identifying the locations where the conversations were shared, and determining the roles of the developers who shared them. Our main observations are: (1) Developers seek ChatGPT’s assistance across 16 types of software engineering inquiries. In both conversations shared in PRs and issues, the most frequently encountered inquiry categories include code generation, conceptual questions, how-to guides, issue resolution, and code review. (2) Developers frequently engage with ChatGPT via multi-turn conversations where each prompt can fulfill various roles, such as unveiling initial or new tasks, iterative follow-up, and prompt refinement. Multi-turn conversations account for 33.2% of the conversations shared in PRs and 36.9% in issues. (3) In collaborative coding, developers leverage shared conversations with ChatGPT to facilitate their role-specific contributions, whether as authors of PRs or issues, code reviewers, or collaborators on issues. Our work serves as the first step towards understanding the dynamics between developers and ChatGPT in collaborative software development and opens up new directions for future research on the topic.

 Tags: "Human/Social", "Process", "AI for SE"  
 
Sallam Abualhaija, Fatma Basak Aydemir, Fabiano Dalpiaz, Davide Dell'Anna, Alessio Ferrari, Xavier Franch, Davide Fucci, "Replication in Requirements Engineering: the NLP for RE Case"

Abstract: Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Despite its empirical vocation, RE research has given limited attention to replication of NLP for RE studies. Replication is hampered by several factors, including the context specificity of the studies, the heterogeneity of the tasks involving NLP, the tasks’ inherent hairiness, and, in turn, the heterogeneous reporting structure. To address these issues, we propose a new artifact, referred to as ID-Card, whose goal is to provide a structured summary of research papers emphasizing replication-relevant information. We construct the ID-Card through a structured, iterative process based on design science. In this article: (i) we report on hands-on experiences of replication; (ii) we review the state-of-the-art and extract replication-relevant information: (iii) we identify, through focus groups, challenges across two typical dimensions of replication: data annotation and tool reconstruction; and (iv) we present the concept and structure of the ID-Card to mitigate the identified challenges. This study aims to create awareness of replication in NLP for RE. We propose an ID-Card that is intended to foster study replication but can also be used in other contexts, e.g., for educational purposes.

 Tags: "Requirements", "AI for SE"  
 
Partha Chakraborty, Krishna Kanth Arumugam, Mahmoud Alfadel, Mei Nagappan, Shane McIntosh, "Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic Datasets"

Abstract: The impact of software vulnerabilities on everyday software systems is concerning. Although deep learning-based models have been proposed for vulnerability detection, their reliability remains a significant concern. While prior evaluation of such models reports impressive recall/F1 scores of up to 99%, we find that these models underperform in practical scenarios, particularly when evaluated on the entire codebases rather than only the fixing commit. In this paper, we introduce a comprehensive dataset ( Real-Vul ) designed to accurately represent real-world scenarios for evaluating vulnerability detection models. We evaluate DeepWukong, LineVul, ReVeal, and IVDetect vulnerability detection approaches and observe a surprisingly significant drop in performance, with precision declining by up to 95 percentage points and F1 scores dropping by up to 91 percentage points. A closer inspection reveals a substantial overlap in the embeddings generated by the models for vulnerable and uncertain samples (non-vulnerable or vulnerability not reported yet), which likely explains why we observe such a large increase in the quantity and rate of false positives. Additionally, we observe fluctuations in model performance based on vulnerability characteristics (e.g., vulnerability types and severity). For example, the studied models achieve 26 percentage points better F1 scores when vulnerabilities are related to information leaks or code injection rather than when vulnerabilities are related to path resolution or predictable return values. Our results highlight the substantial performance gap that still needs to be bridged before deep learning-based vulnerability detection is ready for deployment in practical settings. We dive deeper into why models underperform in realistic settings and our investigation revealed overfitting as a key issue. We address this by introducing an augmentation technique, potentially improving performance by up to 30%. We contribute (a) an approach to creating a dataset that future research can use to improve the practicality of model evaluation; (b) Real-Vul – a comprehensive dataset that adheres to this approach; and (c) empirical evidence that the deep learning-based models struggle to perform in a real-world setting.

 Tags: "Security", "Testing and Quality", "AI for SE"  
 
Lukas Schulte, Benjamin Ledel, Steffen Herbold, "Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP"

Abstract: [Context] The identification of bugs within issues reported to an issue tracking system is crucial for triage. Machine learning models have shown promising results for this task. However, we have only limited knowledge of how such models identify bugs. Explainable AI methods like LIME and SHAP can be used to increase this knowledge. [Objective] We want to understand if explainable AI provides explanations that are reasonable to us as humans and align with our assumptions about the model’s decision-making. We also want to know if the quality of predictions is correlated with the quality of explanations. [Methods] We conduct a study where we rate LIME and SHAP explanations based on their quality of explaining the outcome of an issue type prediction model. For this, we rate the quality of the explanations, i.e., if they align with our expectations and help us understand the underlying machine learning model. [Results] We found that both LIME and SHAP give reasonable explanations and that correct predictions are well explained. Further, we found that SHAP outperforms LIME due to a lower ambiguity and a higher contextuality that can be attributed to the ability of the deep SHAP variant to capture sentence fragments. [Conclusion] We conclude that the model finds explainable signals for both bugs and non-bugs. Also, we recommend that research dealing with the quality of explanations for classification tasks reports and investigates rater agreement, since the rating of explanations is highly subjective.

 Tags: "Testing and Quality", "AI for SE"  
 
Asmar Muqeet, Tao Yue, Shaukat Ali, Paolo Arcaini, Asmar Muqeet, "Mitigating Noise in Quantum Software Testing Using Machine Learning"

Abstract: Quantum Computing (QC) promises computational speedup over classic computing. However, noise exists in nearterm quantum computers. Quantum software testing (for gaining confidence in quantum software’s correctness) is inevitably impacted by noise, i.e., it is impossible to know if a test case failed due to noise or real faults. Existing testing techniques test quantum programs without considering noise, i.e., by executing tests on ideal quantum computer simulators. Consequently, they are not directly applicable to test quantum software on real quantum computers or noisy simulators. Thus, we propose a noise-aware approach (named QOIN ) to alleviate the noise effect on test results of quantum programs. QOIN employs machine learning techniques (e.g., transfer learning) to learn the noise effect of a quantum computer and filter it from a program’s outputs. Such filtered outputs are then used as the input to perform test case assessments (determining the passing or failing of a test case execution against a test oracle). We evaluated QOIN on IBM’s 23 noise models, Google’s two available noise models, and Rigetti’s Quantum Virtual Machine, with six real world and 800 artificial programs. We also generated faulty versions of these programs to check if a failing test case execution can be determined under noise. Results show that QOIN can reduce the noise effect by more than 80% on most noise models. We used an existing test oracle to evaluate QOIN ’s effectiveness in quantum software testing. The results showed that QOIN attained scores of 99%, 75%, and 86% for precision, recall, and F1-score, respectively, for the test oracle across six realworld programs. For artificial programs, QOIN achieved scores of 93%, 79%, and 86% for precision, recall, and F1-score respectively. This highlights QOIN ’s effectiveness in learning noise patterns for noise-aware quantum software testing.

 Tags: "Testing and Quality", "Quantum"  
 
Markus Borg, Leif Jonsson, Emelie Engström, Béla Bartalos, Attila Szabo, "Adopting Automated Bug Assignment in Practice - A Longitudinal Case Study at Ericsson"

Abstract: [Context] The continuous inflow of bug reports is a considerable challenge in large development projects. Inspired by contemporary work on mining software repositories, we designed a prototype bug assignment solution based on machine learning in 2011-2016. The prototype evolved into an internal Ericsson product, TRR, in 2017-2018. TRR’s first bug assignment without human intervention happened in April 2019. [Objective] Our study evaluates the adoption of TRR within its industrial context at Ericsson, i.e., we provide lessons learned related to the productization of a research prototype within a company. Moreover, we investigate 1) how TRR performs in the field, 2) what value TRR provides to Ericsson, and 3) how TRR has influenced the ways of working. [Method] We conduct a preregistered industrial case study combining interviews with TRR stakeholders, minutes from sprint planning meetings, and bug-tracking data. The data analysis includes thematic analysis, descriptive statistics, and Bayesian causal analysis. [Results] TRR is now an incorporated part of the bug assignment process. Considering the abstraction levels of the telecommunications stack, high-level modules are more positive while low-level modules experienced some drawbacks. Most importantly, some bug reports directly reach low-level modules without first having passed through fundamental root-cause analysis steps at higher levels. On average, TRR automatically assigns 30% of the incoming bug reports with an accuracy of 75%. Auto-routed TRs are resolved around 21% faster within Ericsson, and TRR has saved highly seasoned engineers many hours of work. Indirect effects of adopting TRR include process improvements, process awareness, increased communication, and higher job satisfaction. [Conclusions] TRR has saved time at Ericsson, but the adoption of automated bug assignment was more intricate compared to similar endeavors reported from other companies. We primarily attribute the difference to the very large size of the organization and the complex products. Key facilitators in the successful adoption include a gradual introduction, product champions, and careful stakeholder analysis.

 Tags: "Testing and Quality"  
 
Marc Miltenberger, Steven Arzt, "Precisely Extracting Complex Variable Values from Android Apps"

Abstract: Millions of users nowadays rely on their smartphones to process sensitive data through apps from various vendors and sources. Therefore, it is vital to assess these apps for security vulnerabilities and privacy violations. Information such as to which server an app connects through which protocol, and which algorithm it applies for encryption, are usually encoded as variable values and arguments of API calls. However, extracting these values from an app is not trivial. The source code of an app is usually not available, and manual reverse engineering is cumbersome with binary sizes in the tens of megabytes. Current automated tools, however, cannot retrieve values that are computed at runtime through complex transformations. In this article, we present ValDroid, a novel static analysis tool for automatically extracting the set of possible values for a given variable at a given statement in the Dalvik byte code of an Android app. We evaluate ValDroid against existing approaches (JSA, Violist, DroidRA, Harvester, BlueSeal, StringHound, IC3, and COAL) on benchmarks and 794 real-world apps. ValDroid greatly outperforms existing tools. It provides an average F1 score of more than 90%, while only requiring 0.1 s per value on average. For many data types including Network Connections and Dynamic Code Loading, its recall is more than twice the recall of the best existing approaches.

 Tags: "Formal methods"  
 
Riccardo Coppola, Tommaso Fulcini, Luca Ardito, Marco Torchiano, Emil Alégroth, "On Effectiveness and Efficiency of Gamified Exploratory GUI Testing"

Abstract: [Context] Gamification appears to improve enjoyment and quality of execution of software engineering activities, including software testing. Though commonly employed in industry, manual exploratory testing of web application GUIs was proven to be mundane and expensive. Gamification applied to that kind of testing activity has the potential to overcome its limitations, though no empirical research has explored this area yet. [Goal] Collect preliminary insights on how gamification, when performed by novice testers, affects the effectiveness, efficiency, test case realism, and user experience in exploratory testing of web applications. [Method] Common gamification features augment an existing exploratory testing tool: Final Score with Leaderboard, Injected Bugs, Progress Bar, and Exploration Highlights. The original tool and the gamified version are then compared in an experiment involving 144 participants. User experience is elicited using the Technology Acceptance Model (TAM) questionnaire instrument. [Results] Statistical analysis identified several significant differences for metrics that represent the effectiveness and efficiency of tests showing an improvement in coverage when they were developed with gamification. Additionally, user experience is improved with gamification. [Conclusions]. Gamification of exploratory testing has a tangible effect on how testers create test cases for web applications. While the results are mixed, the effects are most beneficial and interesting and warrant more research in the future. Further research shall be aimed at confirming the presented results in the context of state-of-the-art testing tools and real-world development environments.

 Tags: "Testing and Quality", "User experience"  
 
Danniell Hu, Priscila Santiesteban, Madeline Endres, Westley Weimer, "Towards a Cognitive Model of Dynamic Debugging: Does Identifier Construction Matter?"

Abstract: Debugging is a vital and time-consuming process in software engineering. Recently, researchers have begun using neuroimaging to understand the cognitive bases of programming tasks by measuring patterns of neural activity. While exciting, prior studies have only examined small sub-steps in isolation, such as comprehending a method without writing any code or writing a method from scratch without reading any already-existing code. We propose a simple multi-stage debugging model in which programmers transition between Task Comprehension, Fault Localization, Code Editing, Compiling, and Output Comprehension activities. We conduct a human study of $n=28$ participants using a combination of functional near-infrared spectroscopy and standard coding measurements (e.g., time taken, tests passed, etc.). Critically, we find that our proposed debugging stages are both neurally and behaviorally distinct. To the best of our knowledge, this is the first neurally-justified cognitive model of debugging. At the same time, there is significant interest in understanding how programmers from different backgrounds, such as those grappling with challenges in English prose comprehension, are impacted by code features when debugging. We use our cognitive model of debugging to investigate the role of one such feature: identifier construction. Specifically, we investigate how features of identifier construction impact neural activity while debugging by participants with and without reading difficulties. While we find significant differences in cognitive load as a function of morphology and expertise, we do not find significant differences in end-to-end programming outcomes (e.g., time, correctness, etc.). This nuanced result suggests that prior findings on the cognitive importance of identifier naming in isolated sub-steps may not generalize to end-to-end debugging. Finally, in a result relevant to broadening participation in computing, we find no behavioral outcome differences for participants with reading difficulties.

 Tags: "Testing and Quality"  
 
Emanuele Iannone, Giulia Sellitto, Emanuele Iaccarino, Filomena Ferrucci, Andrea De Lucia, Fabio Palomba, "Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?"

Abstract: With the rate of discovered and disclosed vulnerabilities escalating, researchers have been experimenting with machine learning to predict whether a vulnerability will be exploited. Existing solutions leverage information unavailable when a CVE is created, making them unsuitable just after the disclosure. This paper experiments with early exploitability prediction models driven exclusively by the initial CVE record, i.e., the original description and the linked online discussions. Leveraging NVD and Exploit Database, we evaluate 72 prediction models trained using six traditional machine learning classifiers, four feature representation schemas, and three data balancing algorithms. We also experiment with five pre-trained large language models (LLMs). The models leverage seven different corpora made by combining three data sources, i.e., CVE description, Security Focus, and BugTraq. The models are evaluated in a realistic, time-aware fashion by removing the training and test instances that cannot be labeled “neutral” with sufficient confidence. The validation reveals that CVE descriptions and Security Focus discussions are the best data to train on. Pre-trained LLMs do not show the expected performance, requiring further pre-training in the security domain. We distill new research directions, identify possible room for improvement, and envision automated systems assisting security experts in assessing the exploitability.

 Tags: "Testing and Quality", "Security"  
 
Taijara Santana, Paulo Silveira Neto, Eduardo Almeida, Iftekhar Ahmed, "Bug Analysis in Jupyter Notebook Projects: An Empirical Study"

Abstract: Computational notebooks, such as Jupyter, have been widely adopted by data scientists to write code for analyzing and visualizing data. Despite their growing adoption and popularity, few studies have been found to understand Jupyter development challenges from the practitioners’ point of view. This article presents a systematic study of bugs and challenges that Jupyter practitioners face through a large-scale empirical investigation. We mined 14,740 commits from 105 GitHub open source projects with Jupyter Notebook code. Next, we analyzed 30,416 StackOverflow posts, which gave us insights into bugs that practitioners face when developing Jupyter Notebook projects. Next, we conducted 19 interviews with data scientists to uncover more details about Jupyter bugs and to gain insight into Jupyter developers’ challenges. Finally, to validate the study results and proposed taxonomy, we conducted a survey with 91 data scientists. We highlight bug categories, their root causes, and the challenges that Jupyter practitioners face.

 Tags: "Testing and Quality", "AI for SE", "User experience"  
 
Peixun Long, Jianjun Zhao, "Testing Multi-Subroutine Quantum Programs: From Unit Testing to Integration Testing"

Abstract: Quantum computing has emerged as a promising field with the potential to revolutionize various domains by harnessing the principles of quantum mechanics. As quantum hardware and algorithms continue to advance, developing high-quality quantum software has become crucial. However, testing quantum programs poses unique challenges due to the distinctive characteristics of quantum systems and the complexity of multi-subroutine programs. This article addresses the specific testing requirements of multi-subroutine quantum programs. We begin by investigating critical properties by surveying existing quantum libraries and providing insights into the challenges of testing these programs. Building upon this understanding, we focus on testing criteria and techniques based on the whole testing process perspective, spanning from unit testing to integration testing. We delve into various aspects, including IO analysis, quantum relation checking, structural testing, behavior testing, integration of subroutine pairs, and test case generation. We also introduce novel testing principles and criteria to guide the testing process. We conduct comprehensive testing on typical quantum subroutines, including diverse mutants and randomized inputs, to evaluate our proposed approach. The analysis of failures provides valuable insights into the effectiveness of our testing methodology. Additionally, we present case studies on representative multi-subroutine quantum programs, demonstrating the practical application and effectiveness of our proposed testing principles and criteria.

 Tags: "Testing and Quality", "Quantum"  
 
Da Song, Xuan Xie, Jiayang Song, Derui Zhu, Yuheng Huang, Felix Juefei-Xu, Lei Ma, Yuheng Huang, "LUNA: A Model-Based Universal Analysis Framework for Large Language Models"

Abstract: Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, Large Language Models (LLMs) have made rapid advancements that have propelled AI to a new level, enabling and empowering even more diverse applications and industrial domains with intelligence, particularly in areas like software engineering and natural language processing. Nevertheless, a number of emerging trustworthiness concerns and issues exhibited in LLMs, e.g., robustness and hallucination, have already recently received much attention, without properly solving which the widespread adoption of LLMs could be greatly hindered in practice. The distinctive characteristics of LLMs, such as the self-attention mechanism, extremely large neural network scale, and autoregressive generation usage contexts, differ from classic AI software based on Convolutional Neural Networks and Recurrent Neural Networks and present new challenges for quality analysis. Up to the present, it still lacks universal and systematic analysis techniques for LLMs despite the urgent industrial demand across diverse domains. Towards bridging such a gap, in this paper, we initiate an early exploratory study and propose a universal analysis framework for LLMs, named LUNA , which is designed to be general and extensible and enables versatile analysis of LLMs from multiple quality perspectives in a human-interpretable manner. In particular, we first leverage the data from desired trustworthiness perspectives to construct an abstract model as an auxiliary analysis asset and proxy, which is empowered by various abstract model construction methods built-in LUNA . To assess the quality of the abstract model, we collect and define a number of evaluation metrics, aiming at both the abstract model level and the semantics level. Then, the semantics, which is the degree of satisfaction of the LLM w.r.t. the trustworthiness perspective, is bound to and enriches the abstract model with semantics, which enables more detailed analysis applications for diverse purposes, e.g., abnormal behavior detection. To better understand the potential usefulness of our analysis framework LUNA , we conduct a large-scale evaluation, the results of which demonstrate that 1) the abstract model has the potential to distinguish normal and abnormal behavior in LLM, 2) LUNA is effective for the real-world analysis of LLMs in practice, and the hyperparameter settings influence the performance, 3) different evaluation metrics are in different correlations with the analysis performance. In order to encourage further studies in the quality assurance of LLMs, we made all of the code and more detailed experimental results data available on the supplementary website of this paper https://sites.google.com/view/llm-luna.

 Tags: "AI for SE"  
 
Roman Haas, Raphael Nömmer, Elmar Juergens, Sven Apel, "Optimization of Automated and Manual Software Tests in Industrial Practice: A Survey and Historical Analysis"

Abstract: Context : Both automated and manual software testing are widely applied in practice. While being essential for project success and software quality, they are very resource-intensive, thus motivating the pursuit for optimization. Goal : We aim at understanding to what extent test optimization techniques for automated testing from the field of test case selection, prioritization, and test suite minimization can be applied to manual testing processes in practice. Method : We have studied the automated and manual testing process of five industrial study subjects from five different domains with different technological backgrounds and assessed the costs and benefits of test optimization techniques in industrial practice. In particular, we have carried out a cost–benefit analysis of two language-agnostic optimization techniques (test impact analysis and Pareto testing a technique we introduce in this paper) on 2,622 real-world failures from our subject's histories. Results : Both techniques maintain most of the fault detection capability while significantly reducing the test runtime. For automated testing, optimized test suites detect, on average, 80% of failures, while saving 66% of execution time, as compared to 81% failure detection rate for manual test suites and an average time saving of 43%. We observe an average speedup of the time to first failure of around 49 compared to a random test ordering. Conclusion : Our results suggest that optimization techniques from automated testing can be transferred to manual testing in industrial practice, resulting in lower test execution time and much lower time-to-feedback, but coming with process-related limitations and requirements for a successful implementation. All study subjects implemented one of our test optimization techniques in their processes, which demonstrates the practical impact of our findings.

 Tags: "Testing and Quality", "Business"  
 
Maryam Masoudian, Heqing Huang, Morteza Amini, Charles Zhang, "Mole: Efficient Crash Reproduction in Android Applications With Enforcing Necessary UI Events"

Abstract: To improve the quality of Android apps, developers use automated debugging and testing solutions to determine whether the previously found crashes are reproducible. However, existing GUI fuzzing solutions for Android apps struggle to reproduce crashes efficiently based solely on a crash stack trace. This trace provides the location in the app where the crash occurs. GUI fuzzing solutions currently in use rely on heuristics to generate UI events. Unfortunately, these events often do not align with the investigation of an app's UI event space to reach a specific location of code. Hence, they generate numerous events unrelated to the crash, leading to an event explosion. To address this issue, a precise static UI model of widgets and screens can greatly enhance the efficiency of a fuzzing tool in its search. Building such a model requires considering all possible combinations of event sequences on widgets since the execution order of events is not statically determined. However, this approach presents scalability challenges in complex apps with several widgets. In this paper, we propose a directed-based fuzzing solution to reduce an app's event domain to the necessary ones to trigger a crash. Our insight is that the dependencies between widgets in their visual presentation and attribute states provide valuable information in precisely identifying events that trigger a crash. We propose an attribute-sensitive reachability analysis (ASRA) to track dependent widgets in reachable paths to the crash point and distinguish between events in terms of their relevancy to be generated in the crash reproduction process. With instrumentation, we inject code to prune irrelevant events, reducing the event domain to search at run time. We used four famous fuzzing tools, Monkey, Ape, Stoat, and FastBot2, to assess the impact of our solution in decreasing the crash reproduction time and increasing the possibility of reproducing a crash. Our results show that the success ratio of reproducing a crash has increased for one-fourth of crashes. In addition, the average reproduction time of a crash becomes at least 2x faster. Wilcoxon Mann-Whitney test shows this enhancement is significant when our tool is used compared to baseline and insensitive reachability analysis.

 Tags: "Testing and Quality"  
 
Md Ahasanuzzaman, Gustavo A. Oliva, Ahmed E. Hassan, Md Ahasanuzzaman, "Using Knowledge Units of Programming Languages to Recommend Reviewers for Pull Requests: An Empirical Study"

Abstract: Determining the right code reviewer for a given code change requires understanding the characteristics of the changed code, identifying the skills of each potential reviewer (expertise profile), and finding a good match between the two. To facilitate this task, we design a code reviewer recommender that operates on the knowledge units (KUs) of a programming language. We define a KU as a cohesive set of key capabilities that are offered by one or more building blocks of a given programming language. We operationalize our KUs using certification exams for the Java programming language. We detect KUs from 10 actively maintained Java projects from GitHub, spanning 290K commits and 65K pull requests (PRs). We generate developer expertise profiles based on the detected KUs. We use these KU-based expertise profiles to build a code reviewer recommender (KUREC). We compare KUREC’s performance to that of seven baseline recommenders. KUREC ranked first along with the top-performing baseline recommender (RF) in a Scott-Knott ESD analysis of recommendation accuracy (the top-5 accuracy of KUREC is 0.84 (median) and the MAP@5 is 0.51 (median)). From a practical standpoint, we highlight that KUREC’s performance is more stable (lower interquartile range) than that of RF, thus making it more consistent and potentially more trustworthy. We also design three new recommenders by combining KUREC with our baseline recommenders. These new combined recommenders outperform both KUREC and the individual baselines. Finally, we evaluate how reasonable the recommendations from KUREC and the combined recommenders are when those deviate from the ground truth. We observe that KUREC is the recommender with the highest percentage of reasonable recommendations (63.4%). Overall we conclude that KUREC and one of the combined recommenders (e.g., AD_HYBRID) are overall superior to the baseline recommenders that we studied. Future work in the area should thus (i) consider KU-based recommenders as baselines and (ii) experiment with combined recommenders.

 Tags: "Testing and Quality"  
 
Bentley Oakes, Michalis Famelis, Houari Sahraoui, "Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-Practice"

Abstract: Domain experts are increasingly employing machine learning to solve their domain-specific problems. This article presents to software engineering researchers the six key challenges that a domain expert faces in addressing their problem with a computational workflow, and the underlying executable implementation. These challenges arise out of our conceptual framework which presents the “route” of transformations that a domain expert may choose to take while developing their solution. To ground our conceptual framework in the state of the practice, this article discusses a selection of available textual and graphical workflow systems and their support for the transformations described in our framework. Example studies from the literature in various domains are also examined to highlight the tools used by the domain experts as well as a classification of the domain specificity and machine learning usage of their problem, workflow, and implementation. The state of the practice informs our discussion of the six key challenges, where we identify which challenges and transformations are not sufficiently addressed by available tools. We also suggest possible research directions for software engineering researchers to increase the automation of these tools and disseminate best-practice techniques between software engineering and various scientific domains.

 Tags: "SE for AI"  
 
Anda Liang, Emerson Murphy-Hill, Westley Weimer, Yu Huang, "A Controlled Experiment in Age and Gender Bias When Reading Technical Articles in Software Engineering"

Abstract: Online platforms and communities are a critical part of modern software engineering, yet are often affected by human biases. While previous studies investigated human biases and their potential harms against the efficiency and fairness of online communities, they have mainly focused on the open source and Q & A platforms, such as GitHub and Stack Overflow , but overlooked the audience-focused online platforms for delivering programming and SE-related technical articles, where millions of software engineering practitioners share, seek for, and learn from high-quality software engineering articles (i.e., technical articles for SE). Furthermore, most of the previous work has revealed gender and race bias, but we have little knowledge about the effect of age on software engineering practice. In this paper, we propose to investigate the effect of authors’ demographic information (gender and age) on the evaluation of technical articles on software engineering and potential behavioral differences among participants. We conducted a survey-based and controlled human study and collected responses from 540 participants to investigate developers’ evaluation of technical articles for software engineering. By controlling the gender and age of the author profiles of technical articles for SE, we found that raters tend to have more positive content depth evaluations for younger male authors when compared to older male authors and that male participants conduct technical article evaluations faster than female participants, consistent with prior study findings. Surprisingly, different from other software engineering evaluation activities (e.g., code review, pull request, etc.), we did not find a significant difference in the genders of authors on the evaluation outcome of technical articles in SE.

 Tags: "Human/Social"  
 
Chengjie Lu, Shaukat Ali, Tao Yue, "EpiTESTER: Testing Autonomous Vehicles with Epigenetic Algorithm and Attention Mechanism"

Abstract: Testing autonomous vehicles (AVs) under various environmental scenarios that lead the vehicles to unsafe situations is challenging. Given the infinite possible environmental scenarios, it is essential to find critical scenarios efficiently. To this end, we propose a novel testing method, named EpiTESTER , by taking inspiration from epigenetics, which enables species to adapt to sudden environmental changes. In particular, EpiTESTER adopts gene silencing as its epigenetic mechanism, which regulates gene expression to prevent the expression of a certain gene, and the probability of gene expression is dynamically computed as the environment changes. Given different data modalities (e.g., images, lidar point clouds) in the context of AV, EpiTESTER benefits from a multi-model fusion transformer to extract high-level feature representations from environmental factors. Next, it calculates probabilities based on these features with the attention mechanism. To assess the cost-effectiveness of EpiTESTER , we compare it with a probabilistic search algorithm (Simulated Annealing, SA), a classical genetic algorithm (GA) (i.e., without any epigenetic mechanism implemented), and EpiTESTER with equal probability for each gene. We evaluate EpiTESTER with six initial environments from CARLA, an open-source simulator for autonomous driving research, and two end-to-end AV controllers, Interfuser and TCP. Our results show that EpiTESTER achieved a promising performance in identifying critical scenarios compared to the baselines, showing that applying epigenetic mechanisms is a good option for solving practical problems.

 Tags: "Testing and Quality"  
 
Diego Clerissi, Giovanni Denaro, Marco Mobilio, Leonardo Mariani, "Guess the State: Exploiting Determinism to Improve GUI Exploration Efficiency"

Abstract: Many automatic Web testing techniques generate test cases by analyzing the GUI of the Web applications under test, aiming to exercise sequences of actions that are similar to the ones that testers could manually execute. However, the efficiency of the test generation process is severely limited by the cost of analyzing the content of the GUI screens after executing each action. In this paper, we introduce an inference component, Sibilla , which accumulates knowledge about the behavior of the GUI after each action. Sibilla enables the test generators to reuse the results computed for GUI screens that recur multiple times during the test generation process, thus improving the efficiency of Web testing techniques. We experimented Sibilla with Web testing techniques based on three different GUI exploration strategies (Random, Depth-first, and Q-learning) and nine target systems, observing reductions from 22% to 96% of the test generation time.

 Tags: "Testing and Quality"  
 
Zhimin Zhao, Yihao Chen, Abdul Ali Bangash, Bram Adams, Ahmed E. Hassan, "An Empirical Study of Challenges in Machine Learning Asset Management"

Abstract: [Context] In machine learning (ML) applications, assets include not only the ML models themselves, but also the datasets, algorithms, and deployment tools that are essential in the development, training, and implementation of these models. Efficient management of ML assets is critical to ensure optimal resource utilization, consistent model performance, and a streamlined ML development lifecycle. This practice contributes to faster iterations, adaptability, reduced time from model development to deployment, and the delivery of reliable and timely outputs. [Objective] Despite research on ML asset management, there is still a significant knowledge gap on operational challenges, such as model versioning, data traceability, and collaboration issues, faced by asset management tool users. These challenges are crucial because they could directly impact the efficiency, reproducibility, and overall success of machine learning projects. Our study aims to bridge this empirical gap by analyzing user experience, feedback, and needs from Q &A posts, shedding light on the real-world challenges they face and the solutions they have found. [Method] We examine 15, 065 Q &A posts from multiple developer discussion platforms, including Stack Overflow, tool-specific forums, and GitHub/GitLab. Using a mixed-method approach, we classify the posts into knowledge inquiries and problem inquiries. We then apply BERTopic to extract challenge topics and compare their prevalence. Finally, we use the open card sorting approach to summarize solutions from solved inquiries, then cluster them with BERTopic, and analyze the relationship between challenges and solutions. [Results] We identify 133 distinct topics in ML asset management-related inquiries, grouped into 16 macro-topics, with software environment and dependency, model deployment and service, and model creation and training emerging as the most discussed. Additionally, we identify 79 distinct solution topics, classified under 18 macro-topics, with software environment and dependency, feature and component development, and file and directory management as the most proposed. [Conclusions] This study highlights critical areas within ML asset management that need further exploration, particularly around prevalent macro-topics identified as pain points for ML practitioners, emphasizing the need for collaborative efforts between academia, industry, and the broader research community.

 Tags: "SE for AI"  
 
Iren Mazloomzadeh, Gias Uddin, Foutse Khomh, Ashkan Sami, "Reputation Gaming in Crowd Technical Knowledge Sharing"

Abstract: Stack Overflow incentive system awards users with reputation scores to ensure quality. The decentralized nature of the forum may make the incentive system prone to manipulation. This paper offers, for the first time, a comprehensive study of the reported types of reputation manipulation scenarios that might be exercised in Stack Overflow and the prevalence of such reputation gamers by a qualitative study of 1,697 posts from meta Stack Exchange sites. We found four different types of reputation fraud scenarios, such as voting rings where communities form to upvote each other repeatedly on similar posts. We developed algorithms that enable platform managers to automatically identify these suspicious reputation gaming scenarios for review. The first algorithm identifies isolated/semi-isolated communities where probable reputation frauds may occur mostly by collaborating with each other. The second algorithm looks for sudden unusual big jumps in the reputation scores of users. We evaluated the performance of our algorithms by examining the reputation history dashboard of Stack Overflow users from the Stack Overflow website. We observed that around 60-80% of users flagged as suspicious by our algorithms experienced reductions in their reputation scores by Stack Overflow.

 Tags: "Human/Social"  
 
Hanying Shao, Zishuo Ding, Weiyi Shang, Jinqiu Yang, Nikolaos Tsantalis, "Towards Effectively Testing Machine Translation Systems from White-Box Perspectives"

Abstract: Neural Machine Translation (NMT) has experienced significant growth over the last decade. Despite these advancements, machine translation systems still face various issues. In response, metamorphic testing approaches have been introduced for testing machine translation systems. Such approaches involve token replacement, where a single token in the original source sentence is substituted to create mutants. By comparing the translations of mutants with the original translation, potential bugs in the translation systems can be detected. However, the selection of tokens for replacement in the original sentence remains an intriguing problem, deserving further exploration in testing approaches. To address this problem, we design two white-box approaches to identify vulnerable tokens in the source sentence, whose perturbation is most likely to induce translation bugs for a translation system. The first approach, named GRI, utilizes the GRadient Information to identify the vulnerable tokens for replacement, and our second approach, named WALI, uses Word ALignment Information to locate the vulnerable tokens. We evaluate the proposed approaches on a Transformer-based translation system with the News Commentary dataset and 200 English sentences extracted from CNN articles. The results show that both GRI and WALI can effectively generate high-quality test cases for revealing translation bugs. Specifically, our approaches can always outperform state-of-the-art automatic machine translation testing approaches from two aspects: (1) under a certain testing budget (i.e., number of executed test cases), both GRI and WALI can reveal a larger number of bugs than baseline approaches, and (2) when given a predefined testing goal (i.e., number of detected bugs), our approaches always require fewer testing resources (i.e., a reduced number of test cases to execute).

 Tags: "Testing and Quality"  
 
Belinda Schantong, Norbert Siegmund, Janet Siegmund, "Toward a Theory on Programmer’s Block Inspired by Writer’s Block"

Abstract: [Context] Programmer’s block, akin to writer’s block, is a phenomenon where capable programmers struggle to create code. Despite anecdotal evidence, no scientific studies have explored the relationship between programmer’s block and writer’s block. [Objective] The primary objective of this study is to study the presence of blocks during programming and their potential causes. [Method] We conducted semi-structured interviews with experienced programmers to capture their processes, the problems they face, and potential causes. Subsequently, we analyzed the responses through the lens of writing. [Results] We found that among the programmer’s problems during programming, several display strong similarities to writer’s block. Moreover, when investigating possible causes of such blocks, we found a strong relationship between programming and writing activities as well as typical writing strategies employed by programmers. [Conclusions] Strong similarities between programming and writing challenges, processes, and strategies confirm the existence of programmer’s block with similar causes to writer’s block. Thus, strategies from writing used to resolve blocks should be applicable in programming, helping developers to overcome phases of being stuck. Research at the intersection of both areas could lead to productivity gains through reduced developer downtimes.

 Tags: "Human/Social", "Process"  
 
Neelam Tjikhoeri, Lauren Olson, Emitza Guzman, "Best ends by the best means: ethical concerns in app reviews"

Abstract: This work analyzes ethical concerns found in users’ app store reviews. We performed this study because ethical concerns in mobile applications (apps) are widespread, pose severe threats to end users and society, and lack systematic analysis and methods for detection and classification. In addition, app store reviews allow practitioners to collect users’ perspectives, crucial for identifying software flaws, from a geographically distributed and large-scale audience. For our analysis, we collected five million user reviews, developed a set of ethical concerns representative of user preferences, and manually labeled a sample of these reviews. We found that (1) users highly report ethical concerns about censorship, identity theft, and safety (2) user reviews with ethical concerns are longer, more popular, and lowly rated, and (3) there is high automation potential for the classification and filtering of these reviews. Our results highlight the relevance of using app store reviews for the systematic consideration of ethical concerns during software evolution.

 Tags: "Human/Social"  
 
Bianca Trinkenreich, Fabio Santos, Klaas-Jan Stol, "Predicting Attrition among Software Professionals: Antecedents and Consequences of Burnout and Engagement"

Abstract: In this study of burnout and engagement, we address three major themes. First, we offer a review of prior studies of burnout among IT professionals and link these studies to the Job Demands-Resources (JD-R) model. Informed by the JD-R model, we identify three factors that are organizational job resources, and posit that these (a) increase engagement, and (b) decrease burnout. Second, we extend the JD-R by considering software professionals’ intention to stay as a consequence of these two affective states, burnout and engagement. Third, we focus on the importance of factors for intention to stay, and actual retention behavior. We use a unique dataset of over 13,000 respondents at one global IT organization, enriched with employment status 90 days after the initial survey. Leveraging partial least squares structural equation modeling and machine learning, we find that the data mostly support our theoretical model, with some variation across different subgroups of respondents. An importance-performance map analysis suggests that managers may wish to focus on interventions regarding burnout as a predictor of intention to leave. The Machine Learning model suggests that engagement and opportunities to learn are the top two most important factors that explain whether software professionals leave an organization.

 Tags: "Human/Social"  
 
Ricardo Caldas, Juan Antonio Pinera Garcia, Matei Schiopu, Patrizio Pelliccione, Genaína Nunes Rodrigues, Thorsten Berger, "Runtime Verification and Field-based Testing for ROS-based Robotic Systems"

Abstract: Robotic systems are becoming pervasive and adopted in increasingly many domains, such as manufacturing, healthcare, and space exploration. To this end, engineering software has emerged as a crucial discipline for building maintainable and reusable robotic systems. The field of robotics software engineering research has received increasing attention, fostering autonomy as a fundamental goal. However, robotics developers are still challenged trying to achieve this goal given that simulation is not able to deliver solutions to realistically emulate real-world phenomena. Robots also need to operate in unpredictable and uncontrollable environments, which require safe and trustworthy self-adaptation capabilities implemented in software. Typical techniques to address the challenges are runtime verification, field-based testing, and mitigation techniques that enable fail-safe solutions. However, there is no clear guidance to architect ROS-based systems to enable and facilitate runtime verification and field-based testing. This paper aims to fill in this gap by providing guidelines that can help developers and quality assurance (QA) teams when developing, verifying or testing their robots in the field. These guidelines are carefully tailored to address the challenges and requirements of testing robotics systems in real-world scenarios. We conducted (i) a literature review on studies addressing runtime verification and field-based testing for robotic systems, (ii) mined ROS-based applications repositories, and (iii) validated the applicability, clarity, and usefulness via two questionnaires with 55 answers overall. We contribute 20 guidelines: 8 for developers and 12 for QA teams formulated for researchers and practitioners in robotic software engineering. Finally, we map our guidelines to open challenges thus far in runtime verification and field-based testing for ROS-based systems and, we outline promising research directions in the field. Guidelines website and replication package: https://ros-rvft.github.io.

 Tags: "Testing and Quality"  
 
Daniel Ramos, Ines Lynce, Vasco Manquinho, Ruben Martins, Claire Le Goues, "BatFix: Repairing language model-based transpilation"

Abstract: To keep up with changes in requirements, frameworks, and coding practices, software organizations might need to migrate code from one language to another. Source-to-source migration, or transpilation, is often a complex, manual process. Transpilation requires expertise both in the source and target language, making it highly laborious and costly. Languages models for code generation and transpilation are becoming increasingly popular. However, despite capturing code-structure well, code generated by language models is often spurious and contains subtle problems. We propose BatFix, a novel approach that augments language models for transpilation by leveraging program repair and synthesis to fix the code generated by these models. BatFix takes as input both the original program, the target program generated by the machine translation model, and a set of test cases and outputs a repaired program that passes all test cases. Experimental results show that our approach is agnostic to language models and programming languages. BatFix can locate bugs spawning multiple lines and synthesize patches for syntax and semantic bugs for programs migrated from Java to C++ and Python to C++ from multiple language models, including, OpenAI’s Codex.

 Tags: "Testing and Quality", "Prog Comprehension/Reeng/Maint"  
 
Nimmi Rsshinika Weeraddana, Mahmoud Alfadel, Shane McIntosh, "Characterizing Timeout Builds in Continuous Integration"

Abstract: Compute resources that enable Continuous Integration (CI, i.e., the automatic build and test cycle applied to the change sets that development teams produce) are a shared commodity that organizations need to manage. To prevent (erroneous) builds from consuming a large amount of resources, CI service providers often impose a time limit. CI builds that exceed the time limit are automatically terminated. While imposing a time limit helps to prevent abuse of the service, builds that timeout (a) consume the maximum amount of resources that a CI service is willing to provide and (b) leave CI users without an indication of whether the change set will pass or fail the CI process. Therefore, understanding timeout builds and the factors that contribute to them is important for improving the stability and quality of a CI service. In this paper, we investigate the prevalence of timeout builds and the characteristics associated with them. By analyzing a curated dataset of 936 projects that adopt the CircleCI service and report at least one timeout build, we find that the median duration of a timeout build (19.7 minutes) is more than five times that of a build that produces a pass or fail result (3.4 minutes). To better understand the factors contributing to timeout builds, we model timeout builds using characteristics of project build history, build queued time, timeout tendency, size, and author experience based on data collected from 105,663 CI builds. Our model demonstrates a discriminatory power that vastly surpasses that of a random predictor (Area Under the Receiver Operating characteristic Curve, i.e., $AUROC$ = 0.939) and is highly stable in its performance ( $AUROC$ optimism = 0.0001). Moreover, our model reveals that the build history and timeout tendency features are strong indicators of timeout builds, with the timeout status of the most recent build accounting for the largest proportion of the explanatory power. A longitudinal analysis of the incidences of timeout builds (i.e., a study conducted over a period of time) indicates that 64.03% of timeout builds occur consecutively. In such cases, it takes a median of 24 hours before a build that passes or fails occurs. Our results imply that CI providers should exploit build history to anticipate timeout builds.

 Tags: "Process", "Testing and Quality"  
 
Xinyi Wang, Asmar Muqeet, Tao Yue, Shaukat Ali, Paolo Arcaini, "Test Case Minimization with Quantum Annealing"

Abstract: Quantum annealers are specialized quantum computers for solving combinatorial optimization problems with special quantum computing characteristics, e.g., superposition and entanglement. Theoretically, quantum annealers can outperform classic computers. However, current quantum annealers are constrained by a limited number of qubits and cannot demonstrate quantum advantages. Nonetheless, research is needed to develop novel mechanisms to formulate combinatorial optimization problems for quantum annealing (QA). However, QA applications in software engineering remain unexplored. Thus, we propose BootQA, the very first effort at solving test case minimization (TCM) problems on classical software with QA. We provide a novel TCM formulation for QA and utilize bootstrap sampling to optimize the qubit usage. We also implemented our TCM formulation in three other optimization processes: simulated annealing (SA), QA without problem decomposition, and QA with an existing D-Wave problem decomposition strategy, and conducted an empirical evaluation with three real-world TCM datasets. Results show that BootQA outperforms QA without problem decomposition and QA with the existing decomposition strategy regarding effectiveness. Moreover, BootQA’s effectiveness is similar to SA. Finally, BootQA has higher efficiency in terms of time when solving large TCM problems than the other three optimization processes.

 Tags: "Testing and Quality", "Quantum"  
 
Partha Chakraborty, Mahmoud Alfadel, Mei Nagappan, "RLocator: Reinforcement Learning for Bug Localization"

Abstract: Software developers spend a significant portion of time fixing bugs in their projects. To streamline this process, bug localization approaches have been proposed to identify the source code files that are likely responsible for a particular bug. Prior work proposed several similarity-based machine-learning techniques for bug localization. Despite significant advances in these techniques, they do not directly optimize the evaluation measures. We argue that directly optimizing evaluation measures can positively contribute to the performance of bug localization approaches. Therefore, in this paper, we utilize Reinforcement Learning (RL) techniques to directly optimize the ranking metrics. We propose RLocator , a Reinforcement Learning-based bug localization approach. We formulate RLocator using a Markov Decision Process (MDP) to optimize the evaluation measures directly. We present the technique and experimentally evaluate it based on a benchmark dataset of 8,316 bug reports from six highly popular Apache projects. The results of our evaluation reveal that RLocator achieves a Mean Reciprocal Rank (MRR) of 0.62, a Mean Average Precision (MAP) of 0.59, and a Top 1 score of 0.46. We compare RLocator with three state-of-the-art bug localization tools, FLIM, BugLocator, and BL-GAN. Our evaluation reveals that RLocator outperforms both approaches by a substantial margin, with improvements of 38.3% in MAP, 36.73% in MRR, and 23.68% in the Top K metric. These findings highlight that directly optimizing evaluation measures considerably contributes to performance improvement of the bug localization problem.

 Tags: "Testing and Quality", "AI for SE"  
 
Nima Shiri harzevili, Mohammad Mahdi Mohajer, Moshi Wei, Hung Viet Pham, Song Wang, "History-Driven Fuzzing for Deep Learning Libraries"

Abstract: Recently, many Deep Learning (DL) fuzzers have been proposed for API-level testing of DL libraries. However, they either perform unguided input generation (e.g., not considering the relationship between API arguments when generating inputs) or only support a limited set of corner-case test inputs. Furthermore, many developer APIs crucial for library development remain untested, as they are typically not well documented and lack clear usage guidelines, unlike end-user APIs. This makes them a more challenging target for automated testing. To fill this gap, we propose a novel fuzzer named Orion, which combines guided test input generation and corner-case test input generation based on a set of fuzzing heuristic rules constructed from historical data known to trigger critical issues in the underlying implementation of DL APIs. To extract the fuzzing heuristic rules, we first conduct an empirical study on the root cause analysis of 376 vulnerabilities in two of the most popular DL libraries, PyTorch and TensorFlow. We then construct the fuzzing heuristic rules based on the root causes of the extracted historical vulnerabilities. Using these fuzzing heuristic rules, Orion generates corner-case test inputs for API-level fuzzing. In addition, we extend the seed collection of existing studies to include test inputs for developer APIs. Our evaluation shows that Orion reports 135 vulnerabilities in the latest releases of TensorFlow and PyTorch, 76 of which were confirmed by the library developers. Among the 76 confirmed vulnerabilities, 69 were previously unknown, and 7 have already been fixed. The rest are awaiting further confirmation. For end-user APIs, Orion detected 45.58% and 90% more vulnerabilities in TensorFlow and PyTorch, respectively, compared to the state-of-the-art conventional fuzzer, DeepRel. When compared to the state-of-the-art LLM-based DL fuzzer, AtlasFuz, and Orion detected 13.63% more vulnerabilities in TensorFlow and 18.42% more vulnerabilities in PyTorch. Regarding developer APIs, Orion stands out by detecting 117% more vulnerabilities in TensorFlow and 100% more vulnerabilities in PyTorch compared to the most relevant fuzzer designed for developer APIs, such as FreeFuzz.

 Tags: "Testing and Quality", "AI for SE", "Security"  
 
Xu Yang, Gopi Krishnan Rajbahadur, Dayi Lin, Shaowei Wang, Zhen Ming (Jack) Jiang, "SimClone: Detecting Tabular Data Clones using Value Similarity"

Abstract: Data clones are defined as multiple copies of the same data among datasets. Presence of data clones between datasets can cause issues such as difficulties in managing data assets and data license violations when using datasets with clones to build AI software. However, detecting data clones is not trivial. Majority of the prior studies in this area rely on structural information to detect data clones (e.g., font size, column header). However, tabular datasets used to build AI software are typically stored without any structural information. In this paper, we propose a novel method called SimClone for data clone detection in tabular datasets without relying on structural information. SimClone method utilizes value similarities for data clone detection. We also propose a visualization approach as a part of our SimClone method to help locate the exact position of the cloned data between a dataset pair. Our results show that our SimClone outperforms the current state-of-the-art method by at least 20% in terms of both F1-score and AUC. In addition, SimClone’s visualization component helps identify the exact location of the data clone in a dataset with a Precision@10 value of 0.80 in the top 20 true positive predictions.

 Tags: "Prog Comprehension/Reeng/Maint", "Testing and Quality"  
 
Carolin Brandt, Ali Khatami, Mairieli Wessel, Andy Zaidman, "Shaken, Not Stirred. How Developers Like Their Amplified Tests"

Abstract: Test amplification makes systematic changes to existing, manually written tests to provide tests complementary to an automated test suite. We consider developer-centric test amplification, where the developer explores, judges and edits the amplified tests before adding them to their maintained test suite. However, it is as yet unclear which kind of selection and editing steps developers take before including an amplified test into the test suite. In this paper we conduct an open source contribution study, amplifying tests of open source Java projects from GitHub. We report which deficiencies we observe in the amplified tests while manually filtering and editing them to open 39 pull requests with amplified tests. We present a detailed analysis of the maintainer's feedback regarding proposed changes, requested information, and expressed judgment. Our observations provide a basis for practitioners to take an informed decision on whether to adopt developer-centric test amplification. As several of the edits we observe are based on the developer's understanding of the amplified test, we conjecture that developer-centric test amplification should invest in supporting the developer to understand the amplified tests.

 Tags: "Human/Social"  
 
Stephen John Warnett, Uwe Zdun, "On the Understandability of MLOps System Architectures"

Abstract: Machine Learning Operations (MLOps) is the practice of streamlining and optimising the machine learning (ML) workflow, from development to deployment, using DevOps (software development and IT operations) principles and ML-specific activities. Architectural descriptions of MLOps systems often consist of informal textual descriptions and informal graphical system diagrams that vary considerably in consistency, quality, detail, and content. Such descriptions only sometimes follow standards or schemata and may be hard to understand. We aimed to investigate informal textual descriptions and informal graphical MLOps system architecture representations and compare them with semi-formal MLOps system diagrams for those systems. We report on a controlled experiment with sixty-three participants investigating the understandability of MLOps system architecture descriptions based on informal and semi-formal representations. The results indicate that the understandability (quantified by task correctness) of MLOps system descriptions is significantly greater using supplementary semi-formal MLOps system diagrams, that using semi-formal MLOps system diagrams does not significantly increase task duration (and thus hinder understanding), and that task correctness is only significantly correlated with task duration when semi-formal MLOps system diagrams are provided.

 Tags: "Prog Comprehension/Reeng/Maint"  
 
Zhenyang Xu, Yongqiang Tian, Mengxiao Zhang, Jiarui Zhang, Puzhuo Liu, Yu Jiang, Chengnian Sun, "T-Rec: Fine-Grained Language-Agnostic Program Reduction Guided by Lexical Syntax"

Abstract: Program reduction strives to eliminate bug-irrelevant code elements from a bug-triggering program, so that (1) a smaller and more straightforward bug-triggering program can be obtained, (2) and the difference among duplicates (i.e., different programs that trigger the same bug) can be minimized or even eliminated. With such reduction and canonicalization functionality, program reduction facilitates debugging for software, especially language toolchains, such as compilers, interpreters, and debuggers. While many program reduction techniques have been proposed, most of them (especially the language-agnostic ones) overlooked the potential reduction opportunities hidden within tokens. Therefore, their capabilities in terms of reduction and canonicalization are significantly restricted. To fill this gap, we propose T-Rec, a fine-grained language-agnostic program reduction technique guided by lexical syntax. Instead of treating tokens as atomic and irreducible components, T-Rec introduces a fine-grained reduction process that leverages the lexical syntax of programming languages to effectively explore the reduction opportunities in tokens. Through comprehensive evaluations with versatile benchmark suites, we demonstrate that T-Rec significantly improves the reduction and canonicalization capability of two existing language-agnostic program reducers (i.e., Perses and Vulcan). T-Rec enables Perses and Vulcan to further eliminate 1,294 and 1,315 duplicates in a benchmark suite that contains 3,796 test cases that triggers 46 unique bugs. Additionally, T-Rec can also reduce up to 65.52% and 53.73% bytes in the results of Perses and Vulcan on our multi-lingual benchmark suite, respectively.

 Tags: "Analysis"  
 
Yuanjie Xia, Lizhi Liao, Jinfu Chen, Heng Li, Weiyi Shang, "Reducing the Length of Field-replay Based Load Testing"

Abstract: As software systems continuously grow in size and complexity, performance and load related issues have become more common than functional issues. Load testing is usually performed before software releases to ensure that the software system can still provide quality service under a certain load. Therefore, one of the common challenges of load testing is to design realistic workloads that can represent the actual workload in the field. In particular, one of the most widely adopted and intuitive approaches is to directly replay the field workloads in the load testing environment. However, replaying a lengthy, e.g., 48 hours, field workloads is rather resource- and time-consuming, and sometimes even infeasible for large-scale software systems that adopt a rapid release cycle. On the other hand, replaying a short duration of the field workloads may still result in unrealistic load testing. In this work, we propose an automated approach to reduce the length of load testing that is driven by replaying the field workloads. The intuition of our approach is: if the measured performance associated with a particular system behaviour is already stable, we can skip subsequent testing of this system behaviour to reduce the length of the field workloads. In particular, our approach first clusters execution logs that are generated during the system runtime to identify similar system behaviours during the field workloads. Then, we use statistical methods to determine whether the measured performance associated with a system behaviour has been stable. We evaluate our approach on three open-source projects (i.e., OpenMRS , TeaStore , and Apache James ). The results show that our approach can significantly reduce the length of field workloads while the workloads-after-reduction produced by our approach are representative of the original set of workloads. More importantly, the load testing results obtained by replaying the workloads after the reduction have high correlation and similar trend with the original set of workloads. Practitioners can leverage our approach to perform realistic field-replay based load testing while saving the needed resources and time. Our approach sheds light on future research that aims to reduce the cost of load testing for large-scale software systems.

 Tags: "Testing and Quality", "Analysis"  
 
Jinjing Shi, ZiMeng Xiao, Heyuan Shi, Yu Jiang, Xuelong LI, "QuanTest: Entanglement-Guided Testing of Quantum Neural Network Systems"

Abstract: Quantum Neural Network (QNN) combines the Deep Learning (DL) principle with the fundamental theory of quantum mechanics to achieve machine learning tasks with quantum acceleration. Recently, QNN systems have been found to manifest robustness issues similar to classical DL systems. There is an urgent need for ways to test their correctness and security. However, QNN systems differ significantly from traditional quantum software and classical DL systems, posing critical challenges for QNN testing. These challenges include the inapplicability of traditional quantum software testing methods to QNN systems due to differences in programming paradigms and decision logic representations, the dependence of quantum test sample generation on perturbation operators, and the absence of effective information in quantum neurons. In this paper, we propose QuanTest, a quantum entanglement-guided adversarial testing framework to uncover potential erroneous behaviors in QNN systems. We design a quantum entanglement adequacy criterion to quantify the entanglement acquired by the input quantum states from the QNN system, along with two similarity metrics to measure the proximity of generated quantum adversarial examples to the original inputs. Subsequently, QuanTest formulates the problem of generating test inputs that maximize the quantum entanglement adequacy and capture incorrect behaviors of the QNN system as a joint optimization problem and solves it in a gradient-based manner to generate quantum adversarial examples. Experimental results demonstrate that QuanTest possesses the capability to capture erroneous behaviors in QNN systems (generating 67.48%-96.05% more high-quality test samples than the random noise under the same perturbation size constraints). The entanglement-guided approach proves effective in adversarial testing, generating more adversarial examples (maximum increase reached 21.32%).

 Tags: "SE for AI", "Quantum"  
 
Han Wang, Sijia Yu, Chunyang Chen, Burak Turhan, Xiaodong Zhu, "Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects"

Abstract: Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub. We find that: (1) unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests; (2) 68% of the sampled DL projects are not unit tested at all; (3) the layer and utilities (utils) of DL models have the most unit tests. Based on these findings and previous research outcomes, we built a mapping taxonomy between unit tests and faults in DL projects. We discuss the implications of our findings for developers and researchers and highlight the need for unit testing in open-source DL projects to ensure their reliability and stability. The study contributes to this community by raising awareness of the importance of unit testing in DL projects and encouraging further research in this area.

 Tags: "SE for AI", "Testing and Quality"  
 
Michael Fu, Van Nguyen, Kla Tantithamthavorn, Dinh Phung, Trung Le, "Vision Transformer Inspired Automated Vulnerability Repair"

Abstract: Recently, automated vulnerability repair approaches have been widely adopted to combat increasing software security issues. In particular, transformer-based encoder-decoder models achieve competitive results. Whereas vulnerable programs may only consist of a few vulnerable code areas that need repair, existing AVR approaches lack a mechanism guiding their model to pay more attention to vulnerable code areas during repair generation. In this article, we propose a novel vulnerability repair framework inspired by the Vision Transformer based approaches for object detection in the computer vision domain. Similar to the object queries used to locate objects in object detection in computer vision, we introduce and leverage vulnerability queries (VQs) to locate vulnerable code areas and then suggest their repairs. In particular, we leverage the cross-attention mechanism to achieve the cross-match between VQs and their corresponding vulnerable code areas. To strengthen our cross-match and generate more accurate vulnerability repairs, we propose to learn a novel vulnerability mask (VM) and integrate it into decoders’ cross-attention, which makes our VQs pay more attention to vulnerable code areas during repair generation. In addition, we incorporate our VM into encoders’ self-attention to learn embeddings that emphasize the vulnerable areas of a program. Through an extensive evaluation using the real-world 5,417 vulnerabilities, our approach outperforms all of the automated vulnerability repair baseline methods by 2.68% to 32.33%. Additionally, our analysis of the cross-attention map of our approach confirms the design rationale of our VM and its effectiveness. Finally, our survey study with 71 software practitioners highlights the significance and usefulness of AI-generated vulnerability repairs in the realm of software security. The training code and pre-trained models are available at https://github.com/awsm-research/VQM.

 Tags: "Testing and Quality", "Analysis"  
 
Jon Ayerdi, Valerio Terragni, Gunel Jahangirova, Aitor Arrieta, Paolo Tonella, "GenMorph: Automatically Generating Metamorphic Relations via Genetic Programming"

Abstract: Metamorphic testing is a popular approach that aims to alleviate the oracle problem in software testing. At the core of this approach are Metamorphic Relations (MRs), specifying properties that hold among multiple test inputs and corresponding outputs. Deriving MRs is mostly a manual activity, since their automated generation is a challenging and largely unexplored problem. This paper presents GenMorph , a technique to automatically generate MRs for Java methods that involve inputs and outputs that are boolean, numerical, or ordered sequences. GenMorph uses an evolutionary algorithm to search for effective test oracles, i.e., oracles that trigger no false alarms and expose software faults in the method under test. The proposed search algorithm is guided by two fitness functions that measure the number of false alarms and the number of missed faults for the generated MRs. Our results show that GenMorph generates effective MRs for 18 out of 23 methods (mutation score > 20%). Furthermore, it can increase Randoop 's fault detection capability in 7 out of 23 methods, and Evosuite 's in 14 out of 23 methods. When compared with AutoMR , a state-of-the-art MR generator, GenMorph also outperformed its fault detection capability in 9 out of 10 methods.

 Tags: "Testing and Quality"  
 
Jorge Melegati, Kieran Conboy, Daniel Graziotin, "Qualitative Surveys in Software Engineering Research: Definition, Critical Review, and Guidelines"

Abstract: Qualitative surveys are emerging as a popular research method in software engineering (SE), particularly as many aspects of the field are increasingly socio-technical and thus concerned with the subtle, social, and often ambiguous issues that are not amenable to a simple quantitative survey. While many argue that qualitative surveys play a vital role amongst the diverse range of methods employed in SE there are a number of shortcomings that inhibits its use and value. First there is a lack of clarity as to what defines a qualitative survey and what features differentiate it from other methods. There is an absence of a clear set of principles and guidelines for its execution, and what does exist is very inconsistent and sometimes contradictory. These issues undermine the perceived reliability and rigour of this method. Researchers are unsure about how to ensure reliability and rigour when designing qualitative surveys and reviewers are unsure how these should be evaluated. In this paper, we present a systematic mapping study to identify how qualitative surveys have been employed in SE research to date. This paper proposes a set of principles, based on a multidisciplinary review of qualitative surveys and capturing some of the commonalities of the diffuse approaches found. These principles can be used by researchers when choosing whether to do a qualitative survey or not. They can then be used to design their study. The principles can also be used by editors and reviewers to judge the quality and rigour of qualitative surveys. It is hoped that this will result in more widespread use of the method and also more effective and evidence-based reviews of studies that use these methods in the future.

 Tags: "Research Methods"  
 
Luca Giamattei, Matteo Biagiola, Roberto Pietrantuono, Stefano Russo, Paolo Tonella, "Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension Study"

Abstract: In a recent study, Reinforcement Learning (RL) used in combination with many-objective search, has been shown to outperform alternative techniques (random search and many-objective search) for online testing of Deep Neural Network-enabled systems. The empirical evaluation of these techniques was conducted on a state-of-the-art Autonomous Driving System (ADS). This work is a replication and extension of that empirical study. Our replication shows that RL does not outperform pure random test generation in a comparison conducted under the same settings of the original study, but with no confounding factor coming from the way collisions are measured. Our extension aims at eliminating some of the possible reasons for the poor performance of RL observed in our replication: (1) the presence of reward components providing contrasting feedback to the RL agent; (2) the usage of an RL algorithm (Q-learning) which requires discretization of an intrinsically continuous state space. Results show that our new RL agent is able to converge to an effective policy that outperforms random search. Results also highlight other possible improvements, which open to further investigations on how to best leverage RL for online ADS testing.

 Tags: "SE for AI", "Testing and Quality"  
 
Jiannan Wang, Hung Viet Pham, Qi Li, Lin Tan, Yu Guo, Adnan Aziz, Erik Meijer, "D3: Differential Testing of Distributed Deep Learning with Model Generation"

Abstract: Deep Learning (DL) techniques have been widely deployed in many application domains. The growth of DL models’ size and complexity demands distributed training of DL models. Since DL training is complex, software implementing distributed DL training is error-prone. Thus, it is crucial to test distributed deep learning software to improve its reliability and quality. To address this issue, we propose a differential testing technique—D3, which leverages a distributed equivalence rule that we create to test distributed deep learning software. The rationale is that the same model trained with the same model input under different distributed settings should produce equivalent prediction output within certain thresholds. The different output indicates potential bugs in the distributed deep learning software. D3 automatically generates a diverse set of distributed settings, DL models, and model input to test distributed deep learning software. Our evaluation on two of the most popular DL libraries, i.e., PyTorch and TensorFlow, shows that D3 detects 21 bugs, including 12 previously unknown bugs.

 Tags: "SE for AI", "Testing and Quality"  
 
David N. Palacio, Alejandro Velasco, Nathan Cooper, Alvaro Rodriguez, Kevin Moran, Denys Poshyvanyk, "Toward a Theory of Causation for Interpreting Neural Code Models"

Abstract: Neural Language Models of Code, or Neural Code Models (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions. To this end, this paper introduces do-code , a post hoc interpretability method specific to NCMs that is capable of explaining model predictions. do-code is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of docodecode are extensible to exploring different model properties, we provide a concrete instantiation that aims to mitigate the impact of spurious correlations by grounding explanations of model behavior in properties of programming languages. To demonstrate the practical benefit of docodecode , we illustrate the insights that our framework can provide by performing a case study on two popular deep learning architectures and ten NCMs. The results of this case study illustrate that our studied NCMs are sensitive to changes in code syntax. All our NCMs, except for the BERT-like model, statistically learn to predict tokens related to blocks of code ( e.g., brackets, parenthesis, semicolon) with less confounding bias as compared to other programming language constructs. These insights demonstrate the potential of docodecode as a useful method to detect and facilitate the elimination of confounding bias in NCMs.

 Tags: "AI for SE"  
 
Chidera Biringa, Gokhan Kul, "PACE: A Program Analysis Framework for Continuous Performance Prediction"

Abstract: Software development teams establish elaborate continuous integration pipelines containing automated test cases to accelerate the development process of software. Automated tests help to verify the correctness of code modifications decreasing the response time to changing requirements. However, when the software teams do not track the performance impact of pending modifications, they may need to spend considerable time refactoring existing code. This article presents PACE, a program analysis framework that provides continuous feedback on the performance impact of pending code updates. We design performance microbenchmarks by mapping the execution time of functional test cases given a code update. We map microbenchmarks to code stylometry features and feed them to predictors for performance predictions. Our experiments achieved significant performance in predicting code performance, outperforming current state-of-the-art by 75% on neural-represented code stylometry features.

 Tags: "Analysis", "Prog Comprehension/Reeng/Maint"  
 
Costanza Alfieri, Juri Di Rocco, Paola Inverardi, Phuong T. Nguyen, "Exploring User Privacy Awareness on GitHub: An Empirical Study"

Abstract: GitHub provides developers with a practical way to distribute source code and collaboratively work on common projects. To enhance account security and privacy, GitHub allows its users to manage access permissions, review audit logs, and enable two-factor authentication. However, despite the endless effort, the platform still faces various issues related to the privacy of its users. This paper presents an empirical study delving into the GitHub ecosystem. Our focus is on investigating the utilization of privacy settings on the platform and identifying various types of sensitive information disclosed by users. Leveraging a dataset comprising 6,132 developers, we report and analyze their activities by means of comments on pull requests. Our findings indicate an active engagement by users with the available privacy settings on GitHub. Notably, we observe the disclosure of different forms of private information within pull request comments. This observation has prompted our exploration into sensitivity detection using a large language model and BERT, to pave the way for a personalized privacy assistant. Our work provides insights into the utilization of existing privacy protection tools, such as privacy settings, along with their inherent limitations. Essentially, we aim to advance research in this field by providing both the motivation for creating such privacy protection tools and a proposed methodology for personalizing them.

 Tags: "Human/Social", "Security"  
 
Mohammad Hossein Amini, Shervin Naseri, Shiva Nejati, "Evaluating the Impact of Flaky Simulators on Testing Autonomous Driving Systems"

Abstract: Simulators are widely used to test Autonomous Driving Systems (ADS), but their potential flakiness can lead to inconsistent test results. We investigate test flakiness in simulation-based testing of ADS by addressing two key questions: (1) How do flaky ADS simulations impact automated testing that relies on randomized algorithms? and (2) Can machine learning (ML) effectively identify flaky ADS tests while decreasing the required number of test reruns? Our empirical results, obtained from two widely-used open-source ADS simulators and five diverse ADS test setups, show that test flakiness in ADS is a common occurrence and can significantly impact the test results obtained by randomized algorithms. Further, our ML classifiers effectively identify flaky ADS tests using only a single test run, achieving F1-scores of 85%, 82% and 96% for three different ADS test setups. Our classifiers significantly outperform our non-ML baseline, which requires executing tests at least twice, by 31%, 21%, and 13% in F1-score performance, respectively. We conclude with a discussion on the scope, implications and limitations of our study. We provide our complete replication package in a Github repository (Github paper 2023).

 Tags: "SE for AI", "Testing and Quality"  
 
Junxiao Han, Jiahao Zhang, David Lo, Xin Xia, Shuiguang Deng, Minghui Wu, "Understanding Newcomers' Onboarding Process in Deep Learning Projects"

Abstract: Attracting and retaining newcomers are critical for the sustainable development of Open Source Software (OSS) projects. Considerable efforts have been made to help newcomers identify and overcome barriers in the onboarding process. However, fewer studies focus on newcomers’ activities before their successful onboarding. Given the rising popularity of deep learning (DL) techniques, we wonder what the onboarding process of DL newcomers is, and if there exist commonalities or differences in the onboarding process for DL and non-DL newcomers. Therefore, we reported a study to understand the growth trends of DL and non-DL newcomers, mine DL and non-DL newcomers’ activities before their successful onboarding (i.e., past activities), and explore the relationships between newcomers’ past activities and their first commit patterns and retention rates. By analyzing 20 DL projects with 9,191 contributors and 20 non-DL projects with 9,839 contributors, and conducting email surveys with contributors, we derived the following findings: 1) DL projects have attracted and retained more newcomers than non-DL projects. 2) Compared to non-DL newcomers, DL newcomers encounter more deployment, documentation, and version issues before their successful onboarding. 3) DL newcomers statistically require more time to successfully onboard compared to non-DL newcomers, and DL newcomers with more past activities (e.g., issues, issue comments, and watch) are prone to submit an intensive first commit (i.e., a commit with many source code and documentation files being modified). Based on the findings, we shed light on the onboarding process for DL and non-DL newcomers, highlight future research directions, and provide practical suggestions to newcomers, researchers, and projects.

 Tags: "Human/Social", "Process", "Open Source"  
 
Matteo Biagiola, Paolo Tonella, "Boundary State Generation for Testing and Improvement of Autonomous Driving Systems"

Abstract: Recent advances in Deep Neural Networks (DNNs) and sensor technologies are enabling autonomous driving systems (ADSs) with an ever-increasing level of autonomy. However, assessing their dependability remains a critical concern. State-of-the-art ADS testing approaches modify the controllable attributes of a simulated driving environment until the ADS misbehaves. In such approaches, environment instances in which the ADS is successful are discarded, despite the possibility that they could contain hidden driving conditions in which the ADS may misbehave. In this paper, we present GenBo (GENerator of BOundary state pairs), a novel test generator for ADS testing. GenBo mutates the driving conditions of the ego vehicle (position, velocity and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (i.e., where the model starts to misbehave) in the same environment instance. We use such boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has, on average, up to 3 × higher success rate on a separate set of evaluation tracks with respect to the original DNN model.

 Tags: "SE for AI", "Testing and Quality"  
 
Junjie Li, Jinqiu Yang, "Tracking the Evolution of Static Code Warnings: The State-of-the-Art and a Better Approach"

Abstract: Static bug detection tools help developers detect problems in the code, including bad programming practices and potential defects. Recent efforts to integrate static bug detectors in modern software development workflows, such as in code review and continuous integration, are shown to better motivate developers to fix the reported warnings on the fly. A proper mechanism to track the evolution of the reported warnings can better support such integration. Moreover, tracking the static code warnings will benefit many downstream software engineering tasks, such as learning the fix patterns for automated program repair, and learning which warnings are of more interest, so they can be prioritized automatically. In addition, the utilization of tracking tools enables developers to concentrate on the most recent and actionable static warnings rather than being overwhelmed by the thousands of warnings from the entire project. This, in turn, enhances the utilization of static analysis tools. Hence, precisely tracking the warnings by static bug detectors is critical to improving the utilization of static bug detectors further. In this paper, we study the effectiveness of the state-of-the-art (SOTA) solution in tracking static code warnings and propose a better solution based on our analysis of the insufficiency of the SOTA solution. In particular, we examined over 2,000 commits in four large-scale open-source systems (i.e., JClouds, Kafka, Spring-boot, and Guava) and crafted a dataset of 3,451 static code warnings by two static bug detectors (i.e., Spotbugs and PMD). We manually uncovered the ground-truth evolution status of the static warnings: persistent, removed fix , removed non-fix and newly-introduced. Upon manual analysis, we identified the main reasons behind the insufficiency of the SOTA solution. Furthermore, we propose StaticTracker to track static warnings over software development history. Our evaluation shows that StaticTracker significantly improves the tracking precision, i.e., from 64.4% to 90.3% for the evolution statuses combined (removed fix , removed non-fix and newly-introduced).

 Tags: "Analysis", "Prog Comprehension/Reeng/Maint"  
 
Youssef Esseddiq Ouatiti, Mohammed Sayagh, Noureddine Kerzazi, Bram Adams, Ahmed E. Hassan, Youssef Esseddiq Ouatiti, "The impact of Concept drift and Data leakage on Log Level Prediction Models"

Abstract: Developers insert logging statements to collect information about the execution of their systems. Along with a logging framework (e.g., Log4j), practitioners can decide which log statement to print or suppress by tagging each log line with a log level. Since picking the right log level for a new logging statement is not straightforward, machine learning models for log level prediction (LLP) were proposed by prior studies. While these models show good performances, they are still subject to the context in which they are applied, specifically to the way practitioners decide on log levels in different phases of the development history of their projects (e.g., debugging vs. testing). For example, Openstack developers interchangeably increased/decreased the verbosity of their logs across the history of the project in response to code changes (e.g., before vs after fixing a new bug). Thus, the manifestation of these changing log verbosity choices across time can lead to concept drift and data leakage issues, which we wish to quantify in this paper on LLP models. In this paper, we empirically quantify the impact of data leakage and concept drift on the performance and interpretability of LLP models in three large open-source systems. Additionally, we compare the performance and interpretability of several time-aware approaches to tackle time-related issues. We observe that both shallow and deep-learning-based models suffer from both time-related issues. We also observe that training a model on just a window of the historical data (i.e., contextual model) outperforms models that are trained on the whole historical data (i.e., all-knowing model) in the case of our shallow LLP model. Finally, we observe that contextual models exhibit a different (even contradictory) model interpretability, with a (very) weak correlation between the ranking of important features of the pairs of contextual models we compared. Our findings suggest that data leakage and concept drift should be taken into consideration for LLP models. We also invite practitioners to include the size of the historical window as an additional hyperparameter to tune a suitable contextual model instead of leveraging all-knowing models.

 Tags: "AI for SE"  
 
Delano Oliveira, Reydne Santos, Benedito de Oliveira, Martin Monperrus, Fernando Castor, Fernanda Madeiral, "Understanding Code Understandability Improvements in Code Reviews"

Abstract: [Motivation] Code understandability plays a crucial role in software development, as developers spend between 58% and 70% of their time reading source code. Improving code understandability can lead to enhanced productivity and save maintenance costs. [Problem] Experimental studies aim to establish what makes code more or less understandable in a controlled setting, but ignore that what makes code easier to understand in the real world also depends on extraneous elements such as project culture and guidelines, and developers’ background. Not accounting for the influence of these factors may lead to results that are sound but have little external validity. [Objective]: This study aims to investigate how developers improve code understandability during software development through code review comments. Its basic assumption is that code reviewers are specialists in code quality within a project. [Method and Results] We manually analyzed 2,401 code review comments from Java open-source projects on GitHub and find that over 42% of all comments focus on improving code understandability, demonstrating the significance of this aspect in code reviews. We further explored a subset of 385 comments related to code understandability and identified eight categories of code understandability concerns, such as incomplete or inadequate code documentation, bad identifier, and unnecessary code. Among the suggestions to improve code understandability, 83.9% were accepted and integrated into the codebase. Among these, only two (less than 1%) end up being reverted later. We also identified types of patches that improve code understandability, ranging from simple changes (e.g., removing unused code) to more context-dependent improvements (e.g., replacing method calling chain by existing API). Finally, we evaluated the ability of four well-known linters to flag the identified code understandability issues. These linters cover less than 30% of these issues, although some of them could be easily added as new rules. [Implications] Our findings motivate and provide practical insight for the construction of tools to make code more understandable, e.g., understandability improvements are rarely reverted and thus can be used as reliable training data for specialized ML-based tools. This is also supported by our dataset, which can be used to train such models. Finally, our findings can also serve as a basis to develop evidence-based code style guides. [Data Availability] Our data is publicly available at https://codeupcrc.github.io.

 Tags: "Prog Comprehension/Reeng/Maint"  
 
Aniruddhan Murali, Mahmoud Alfadel, Meiyappan Nagappan, Meng Xu, Chengnian Sun, "AddressWatcher: Sanitizer-Based Localization of Memory Leak Fixes"

Abstract: Memory leak bugs are a major problem in C/C++ programs. They occur when memory objects are not deallocated. Developers need to manually deallocate these objects to prevent memory leaks. As such, several techniques have been proposed to automatically fix memory leaks. Although proposed approaches have merit in automatically fixing memory leaks, they present limitations. Static-based approaches attempt to trace the complete semantics of memory object across all paths. However, they have scalability-related challenges when the target program has a large number of paths (path explosion). On the other hand, dynamic approaches can spell out precise semantics of memory object only on a single execution path (it does not consider multiple execution paths). In this paper, we complement prior approaches by designing and implementing a novel framework named AddressWatcher . AddressWatcher allows the semantics of a memory object to be tracked on multiple execution paths. Addresswatcher accomplishes this by using a leak database that allows one to store and compare different execution paths of a leak over several test cases. Also, AddressWatcher performs lightweight instrumentation during compile time that is utilized during the program execution to watch and track memory leak read/writes. We conduct an evaluation of AddressWatcher over five popular packages, namely binutils, openssh, tmux, openssl and git. In 23 out of 50 real-world memory leak bugs, AddressWatcher correctly points to a free location to fix memory leaks. Finally, we submit 25 Pull Requests across 12 popular OSS repositories using AddressWatcher suggestions. Among these, 21 were merged leading to 5 open issues being addressed. In fact, our critical fix prompted a new version release for the calc repository, a program used to find large primes. Furthermore, our contributions through these PRs sparked intense discussions and appreciation in various repositories such as coturn, h2o, and radare2.

 Tags: "Analysis"  
 
Amir M. Ebrahimi, Bram Adams, Gustavo A. Oliva, Ahmed E. Hassan, "A Large-Scale Exploratory Study on the Proxy Pattern in Ethereum"

Abstract: The proxy pattern is a well-known design pattern with numerous use cases in several sectors of the software industry (e.g., network applications, microservices, and IoT). As such, the use of the proxy pattern is also a common approach in the development of complex decentralized applications (DApps) on the Ethereum blockchain. A contract that implements the proxy pattern (proxy contract) acts as a layer between the clients and the target contract, enabling greater flexibility (e.g., data validation checks) and upgradeability (e.g., online smart contract replacement with zero downtime) in DApp development. Despite the importance of proxy contracts, little is known about (i) how their prevalence changed over time, (ii) the ways in which developers integrate proxies in the design of DApps, and (iii) what proxy types are being most commonly leveraged by developers. In this paper, we present a large-scale exploratory study on the use of the proxy pattern in Ethereum. We analyze a dataset of all Ethereum smart contracts as of Sep. 2022 containing 50M smart contracts and 1.6B transactions, and apply both quantitative and qualitative methods in order to (i) determine the prevalence of proxy contracts, (ii) understand the ways they are deployed and integrated into applications, and (iii) uncover the prevalence of different types of proxy contracts. Our findings reveal that 14.2% of all deployed smart contracts are proxy contracts. We show that proxy contracts are being more actively used than non-proxy contracts. Also, the usage of proxy contracts in various contexts, transactions involving proxy contracts, and adoption of proxy contracts by users have shown an upward trend over time, peaking at the end of our study period. They are either deployed through off-chain scripts or on-chain factory contracts, with the former and latter being employed in 39.1% and 60.9% of identified usage contexts in turn. We found that while the majority (67.8%) of proxies act as an interceptor, 32.2% enables upgradeability. Proxy contracts are typically (79%) implemented based on known reference implementations with 29.4% being of type ERC-1167, a class of proxies that aims to cheaply reuse and clone contracts’ functionality. Our evaluation shows that our proposed behavioral proxy detection method has a precision and recall of 100% in detecting active proxies. Finally, we derive a set of practical recommendations for developers and introduce open research questions to guide future research on the topic.

 Tags: "Design/Architecture", "Process"  
 
Xinyi Wang, Shaukat Ali, Tao Yue, Paolo Arcaini, "Quantum Approximate Optimization Algorithm for Test Case Optimization"

Abstract: Test case optimization (TCO) reduces the software testing cost while preserving its effectiveness. However, to solve TCO problems for large-scale and complex software systems, substantial computational resources are required. Quantum approximate optimization algorithms (QAOAs) are promising combinatorial optimization algorithms that rely on quantum computational resources, with the potential to offer increased efficiency compared to classical approaches. Several proof-of-concept applications of QAOAs for solving combinatorial problems, such as portfolio optimization, energy optimization in power systems, and job scheduling, have been proposed. Given the lack of investigation into QAOA’s application for TCO problems, and motivated by the computational challenges of TCO problems and the potential of QAOAs, we present IGDec-QAOA to formulate a TCO problem as a QAOA problem and solve it on both ideal and noisy quantum computer simulators, as well as on a real quantum computer. To solve bigger TCO problems that require many qubits, which are unavailable these days, we integrate a problem decomposition strategy with the QAOA. We performed an empirical evaluation with five TCO problems and four publicly available industrial datasets from ABB, Google, and Orona to compare various configurations of IGDec-QAOA, assess its decomposition strategy of handling large datasets, and compare its performance with classical algorithms (i.e., Genetic Algorithm (GA) and Random Search). Based on the evaluation results achieved on an ideal simulator, we recommend the best configuration of our approach for TCO problems. Also, we demonstrate that our approach can reach the same effectiveness as GA and outperform GA in two out of five test case optimization problems we conducted. In addition, we observe that, on the noisy simulator, IGDec-QAOA achieved similar performance to that from the ideal simulator. Finally, we also demonstrate the feasibility of IGDec-QAOA on a real quantum computer in the presence of noise.

 Tags: "Testing and Quality", "Quantum"  
 
Alison Fernandez Blanco, Araceli Queirolo Cordova, Alexandre Bergel, Juan Pablo Sandoval Alcocer, "Asking and Answering Questions During Memory Profiling"

Abstract: The software engineering community has produced numerous tools, techniques, and methodologies for practitioners to analyze and optimize memory usage during software execution. However, little is known about the actual needs of programmers when analyzing memory behavior and how they use tools to address those needs. We conducted an exploratory study (i) to understand what a programmer needs to know when analyzing memory behavior and (ii) how a programmer finds that information with current tools. From our observations, we provide a catalog of 34 questions programmers ask themselves when analyzing memory behavior. We also report a detailed analysis of how some tools are used to answer these questions and the difficulties participants face during the process. Finally, we present four recommendations to guide researchers and developers in designing, evaluating, and improving memory behavior analysis tools.

 Tags: "User experience", "Prog Comprehension/Reeng/Maint"  
 
Zeyang Ma, Shouvick Mondal, Tse-Hsun (Peter) Chen, Haoxiang Zhang, Ahmed E. Hassan, Zeyang Ma, "VulNet: Towards improving vulnerability management in the Maven ecosystem"

Abstract: Developers rely on software ecosystems such as Maven to manage and reuse external libraries (i.e., dependencies). Due to the complexity of the used dependencies, developers may face challenges in choosing which library to use and whether they should upgrade or downgrade a library. One important factor that affects this decision is the number of potential vulnerabilities in a library and its dependencies. Therefore, state-of-the-art platforms such as Maven Repository (MVN) and Open Source Insights (OSI) help developers in making such a decision by presenting vulnerability information associated with every dependency. In this paper, we first conduct an empirical study to understand how the two platforms, MVN and OSI, present and categorize vulnerability information. We found that these two platforms may either overestimate or underestimate the number of associated vulnerabilities in a dependency, and they lack prioritization mechanisms on which dependencies are more likely to cause an issue. Hence, we propose a tool named VulNet to address the limitations we found in MVN and OSI. Through an evaluation of 19,886 versions of the top 200 popular libraries, we find VulNet includes 90.5% and 65.8% of the dependencies that were omitted by MVN and OSI, respectively. VulNet also helps reduce 27% of potentially unreachable or less impactful vulnerabilities listed by OSI in test dependencies. Finally, our user study with 24 participants gave VulNet an average rating of 4.5/5 in presenting and prioritizing vulnerable dependencies, compared to 2.83 (MVN) and 3.14 (OSI).

 Tags: "Process", "Security"  
 
Aurora Papotti, Ranindya Paramitha, Fabio Massacci, "On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools."

Abstract: [Objective] We investigated whether (possibly wrong) security patches suggested by Automated Program Repairs (APR) for real world projects are recognized by human reviewers. We also investigated whether knowing that a patch was produced by an allegedly specialized tool does change the decision of human reviewers. [Method] We perform an experiment with Master students in Computer Science. In the first phase, using a balanced design, we propose to human reviewers a combination of patches proposed by APR tools for different vulnerabilities and ask reviewers to adopt or reject the proposed patches. In the second phase, we tell participants that some of the proposed patches were generated by security-specialized tools (even if the tool was actually a ‘normal’ APR tool) and measure whether the human reviewers would change their decision to adopt or reject a patch. [Results] It is easier to identify wrong patches than correct patches, and correct patches are not confused with partially correct patches. Also patches from APR Security tools are adopted more often than patches suggested by generic APR tools but there is not enough evidence to verify if ‘bogus’ security claims are distinguishable from ‘true security’ claims. Finally, the number of switches to the patches suggested by security tool is significantly higher after the security information is revealed irrespective of correctness. [Limitations] The experiment was conducted in an academic setting, and focused on a limited sample of popular APR tools and popular vulnerability types.

 Tags: "User experience", "Human/Social", "Security"  
 
Matteo Paltenghi, Rahul Pandita, Austin Henley, Albert Ziegler, "Follow-Up Attention: An Empirical Study of Developer and Neural Model Code Exploration"

Abstract: Recent neural models of code, such as OpenAI Codex and AlphaCode, have demonstrated remarkable proficiency at code generation due to the underlying attention mechanism. However, it often remains unclear how the models actually process code, and to what extent their reasoning and the way their attention mechanism scans the code matches the patterns of developers. A poor understanding of the model reasoning process limits the way in which current neural models are leveraged today, so far mostly for their raw prediction. To fill this gap, this work studies how the processed attention signal of three open large language models - CodeGen, InCoder and GPT-J - agrees with how developers look at and explore code when each answers the same sensemaking questions about code. Furthermore, we contribute an open-source eye-tracking dataset comprising 92 manually-labeled sessions from 25 developers engaged in sensemaking tasks. We empirically evaluate five heuristics that do not use the attention and ten attention-based post-processing approaches of the attention signal of CodeGen against our ground truth of developers exploring code, including the novel concept of follow-up attention which exhibits the highest agreement between model and human attention. Our follow-up attention method can predict the next line a developer will look at with 47% accuracy. This outperforms the baseline prediction accuracy of 42.3%, which uses the session history of other developers to recommend the next line. These results demonstrate the potential of leveraging the attention signal of pre-trained models for effective code exploration.

 Tags: "Human/Social", "User experience", "AI for SE"  
 
Anwar Ghammam, Rania Khalsi, Marouane Kessentini, Foyzul Hassan, "Efficient Management of Containers for Software Defined Vehicles"

Abstract: Containerization technology, such as Docker, is gaining in popularity in newly established software-defined vehicle architectures (SDVA). However, executing those containers can quickly become computationally expensive in constrained environments, given the limited CPU, memory, and energy resources in the Electric Control Units (ECU) of SDVA. Consequently, the efficient management of these containers is crucial for enabling the on-demand usage of the applications in the vehicle based on the available resources while considering several constraints and priorities, including failure tolerance, security, safety, and comfort. In this paper, we propose a dynamic software container management approach for constrained environments such as embedded devices/ECUs in SDVA within smart cars. To address the conflicting objectives and constraints within the vehicle, we design a novel search-based approach based on multi-objective optimization. This approach facilitates the allocation, movement, or suspension of containers between ECUs in the cluster. Collaborating with our industry partner, Ford Motor Company, we evaluate our approach using different real-world software-defined scenarios. These scenarios involve using heterogeneous clusters of ECU devices in vehicles based on real-world software containers and use-case studies from the automotive industry. The experimental results demonstrate that our scheduler outperforms existing scheduling algorithms, including the default Docker scheduler -Spread- commonly used in automotive applications. Our proposed scheduler exhibits superior performance in terms of energy and resource cost efficiency. Specifically, it achieves a 35% reduction in energy consumption in power-saving mode compared to the scheduler employed by Ford Motor Company. Additionally, our scheduler effectively distributes workload among the ECUs in the cluster, minimizing resource usage, and dynamically adjusts to the real-time requirements and constraints of the car environment. This work will serve as a fundamental building block in the automotive industry to efficiently manage software containers in smart vehicles considering constraints and priorities in the real world.

 Tags: "Prog Comprehension/Reeng/Maint"  
 
Hao Li, Cor-Paul Bezemer, "Bridging the Language Gap: An Empirical Study of Bindings for Open Source Machine Learning Libraries Across Software Package Ecosystems"

Abstract: Open source machine learning (ML) libraries enable developers to integrate advanced ML functionality into their own applications. However, popular ML libraries, such as TensorFlow, are not available natively in all programming languages and software package ecosystems. Hence, developers who wish to use an ML library which is not available in their programming language or ecosystem of choice, may need to resort to using a so-called binding library (or binding). Bindings provide support across programming languages and package ecosystems for reusing a host library. For example, the Keras .NET binding provides support for the Keras library in the NuGet (.NET) ecosystem even though the Keras library was written in Python. In this paper, we collect 2,436 cross-ecosystem bindings for 546 ML libraries across 13 software package ecosystems by using an approach called BindFind, which can automatically identify bindings and link them to their host libraries. Furthermore, we conduct an in-depth study of 133 cross-ecosystem bindings and their development for 40 popular open source ML libraries. Our findings reveal that the majority of ML library bindings are maintained by the community, with npm being the most popular ecosystem for these bindings. Our study also indicates that most bindings cover only a limited range of the host library’s releases, often experience considerable delays in supporting new releases, and have widespread technical lag. Our findings highlight key factors to consider for developers integrating bindings for ML libraries and open avenues for researchers to further investigate bindings in software package ecosystems.

 Tags: "Prog Comprehension/Reeng/Maint"  
 
Michel Maes Bermejo, Alexander Serebrenik, Micael Gallego Carrillo, Francisco Gortázar Bellas, Gregorio Robles, Jesus M. Gonzalez-Barahona, "Hunting bugs: Towards an automated approach to identifying which change caused a bug through regression testing"

Abstract: [Context] Finding code changes that introduced bugs is important both for practitioners and researchers, but doing it precisely is a manual, effort-intensive process. The perfect test method is a theoretical construct aimed at detecting Bug-Introducing Changes (BIC) through a theoretical perfect test. This perfect test always fails if the bug is present, and passes otherwise. [Objective] To explore a possible automatic operationalization of the perfect test method. [Method] To use regression tests as substitutes for the perfect test. For this, we transplant the regression tests to past snapshots of the code, and use them to identify the BIC, on a well-known collection of bugs from the Defects4J dataset. [Results] From 809 bugs in the dataset, when running our operationalization of the perfect test method, for 95 of them the BIC was identified precisely and in the remaining 4 cases, a list of candidates including the BIC was provided. [Conclusions] We demonstrate that the operationalization of the perfect test method through regression tests is feasible and can be completely automated in practice when tests can be transplanted and run in past snapshots of the code. Given that implementing regression tests when a bug is fixed is considered a good practice, when developers follow it, they can detect effortlessly bug-introducing changes by using our operationalization of the perfect test method.

 Tags: "Testing and Quality"  
 
Matteo Biagiola, Andrea Stocco, Vincenzo Riccio, Paolo Tonella, "Two is Better Than One: Digital Siblings to Improve Autonomous Driving Testing"

Abstract: Simulation-based testing represents an important step to ensure the reliability of autonomous driving software. In practice, when companies rely on third-party general-purpose simulators, either for in-house or outsourced testing, the generalizability of testing results to real autonomous vehicles is at stake. In this paper, we enhance simulation-based testing by introducing the notion of digital siblings—a multi-simulator approach that tests a given autonomous vehicle on multiple general-purpose simulators built with different technologies, that operate collectively as an ensemble in the testing process. We exemplify our approach on a case study focused on testing the lane-keeping component of an autonomous vehicle. We use two open-source simulators as digital siblings, and we empirically compare such a multi-simulator approach against a digital twin of a physical scaled autonomous vehicle on a large set of test cases. Our approach requires generating and running test cases for each individual simulator, in the form of sequences of road points. Then, test cases are migrated between simulators, using feature maps to characterize the exercised driving conditions. Finally, the joint predicted failure probability is computed, and a failure is reported only in cases of agreement among the siblings. Our empirical evaluation shows that the ensemble failure predictor by the digital siblings is superior to each individual simulator at predicting the failures of the digital twin. We discuss the findings of our case study and detail how our approach can help researchers interested in automated testing of autonomous driving software.

 Tags: "SE for AI", "Testing and Quality"  
 
Farshad Kazemi, Maxime Lamothe, Shane McIntosh, "Characterizing the Prevalence, Distribution, and Duration of Stale Reviewer Recommendations"

Abstract: The appropriate assignment of reviewers is a key factor in determining the value that organizations can derive from code review. While inappropriate reviewer recommendations can hinder the benefits of the code review process, identifying these assignments is challenging. Stale reviewers, i.e., those who no longer contribute to the project, are one type of reviewer recommendation that is certainly inappropriate. Understanding and minimizing this type of recommendation can thus enhance the benefits of the code review process. While recent work demonstrates the existence of stale reviewers, to the best of our knowledge, attempts have yet to be made to characterize and mitigate them. In this paper, we study the prevalence and potential effects. We then propose and assess a strategy to mitigate stale recommendations in existing code reviewer recommendation tools. By applying five code reviewer recommendation approaches (LearnRec, RetentionRec, cHRev, Sofia, and WLRRec) to three thriving open-source systems with 5,806 contributors, we observe that, on average, 12.59% of incorrect recommendations are stale due to developer turnover; however, fewer stale recommendations are made when the recency of contributions is considered by the recommendation objective function. We also investigate which reviewers appear in stale recommendations and observe that the top reviewers account for a considerable proportion of stale recommendations. For instance, in 15.31% of cases, the top-3 reviewers account for at least half of the stale recommendations. Finally, we study how long stale reviewers linger after the candidate leaves the project, observing that contributors who left the project 7.7 years ago are still suggested to review change sets. Based on our findings, we propose separating the reviewer contribution recency from the other factors that are used by the CRR objective function to filter out developers who have not contributed during a specified duration. By evaluating this strategy with different intervals, we assess the potential impact of this choice on the recommended reviewers. The proposed filter reduces the staleness of recommendations, i.e., the Staleness Reduction Ratio (SRR) improves between 21.44%–92.39%. Yet since the strategy may increase active reviewer workload, careful project-specific exploration of the impact of the cut-off setting is crucial.

 Tags: "Human/Social", "Process"  
 
Ahcheong Lee, Youngseok Choi, Shin Hong, Yunho Kim, Kyutae Cho, Moonzoo Kim, "ZigZagFuzz: Interleaved Fuzzing of Program Options and Files"

Abstract: Command-line options (e.g., -l, -F, -R for ls) given to a command-line program can significantly alternate the behaviors of the program. Thus, fuzzing not only file input but also program options can improve test coverage and bug detection. In this paper, we propose ZigZagFuzz which achieves higher test coverage and detects more bugs than the state-of-the-art fuzzers by separately mutating program options and file inputs in an iterative/interleaving manner. ZigZagFuzz applies the following three core ideas. First, to utilize different characteristics of the program option domain and the file input domain, ZigZagFuzz separates phases of mutating program options from ones of mutating file inputs and performs two distinct mutation strategies on the two different domains. Second, to reach deep segments of a target program that are accessed through an interleaving sequence of program option checks and file inputs checks, ZigZagFuzz continuously interleaves phases of mutating program options with phases of mutating file inputs. Finally, to improve fuzzing performance further, ZigZagFuzz periodically shrinks input corpus by removing similar test inputs based on their function coverage. The experiment results on the 20 real-world programs show that ZigZagFuzz improves test coverage and detects 1.9 to 10.6 times more bugs than the state-of-the-art fuzzers that mutate program options such as AFL++-argv, AFL++-all, Eclipser, CarpetFuzz, ConfigFuzz, and POWER. We have reported the new bugs detected by ZigZagFuzz, and the original developers confirmed our bug reports.

 Tags: "Testing and Quality", "Analysis"  
 
Xin Tan, Xinyue Lv, Jing Jiang, Li Zhang, "Understanding Real-time Collaborative Programming: a Study of Visual Studio Live Share"

Abstract: Real-time collaborative programming (RCP) entails developers working simultaneously, regardless of their geographic locations. RCP differs from traditional asynchronous online programming methods, such as Git or SVN, where developers work independently and update the codebase at separate times. Although various real-time code collaboration tools (e.g., Visual Studio Live Share, Code with Me, and Replit) have kept emerging in recent years, none of the existing studies explicitly focus on a deep understanding of the processes or experiences associated with RCP. To this end, we combine interviews and an e-mail survey with the users of Visual Studio Live Share, aiming to understand (i) the scenarios, (ii) the requirements, and (iii) the challenges when developers participate in RCP. We find that developers participate in RCP in 18 different scenarios belonging to six categories, e.g., pair programming, group debugging, and code review. However, existing users’ attitudes toward the usefulness of the current RCP tools in these scenarios were significantly more negative than the expectations of potential users. As for the requirements, the most critical category is live editing, followed by the need for sharing terminals to enable hosts and guests to run commands and see the results, as well as focusing and following, which involves “following” the host’s edit location and “focusing” the guests’ attention on the host with a notification. Under these categories, we identify 17 requirements, but most of them are not well supported by current tools. In terms of challenges, we identify 19 challenges belonging to seven categories. The most severe category of challenges is lagging followed by permissions and conflicts. The above findings indicate that the current RCP tools and even collaborative environment need to be improved greatly and urgently. Based on these findings, we discuss the recommendations for different stakeholders, including practitioners, tool designers, and researchers.

 Tags: "Human/Social", "Process"  
 
Sarah Fakhoury, Aaditya Naik, Georgios Sakkas, Saikat Chakraborty, Shuvendu Lahiri, "LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation"

Abstract: Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking that the generated code correctly satisfies the user intent. In this paper, we propose a novel interactive workflow TiCoder for guided intent clarification (i.e., partial formalization) through tests to support the generation of more accurate code suggestions. Through a mixed methods user study with 15 programmers, we present an empirical evaluation of the effectiveness of the workflow to improve code generation accuracy. We find that participants using the proposed workflow are significantly more likely to correctly evaluate AI generated code, and report significantly less task-induced cognitive load. Furthermore, we test the potential of the workflow at scale with four different state-of-the-art LLMs on two python datasets, using an idealized proxy for a user feedback. We observe an average absolute improvement of 45.97% in the pass@1 code generation accuracy for both datasets and across all LLMs within 5 user interactions, in addition to the automatic generation of accompanying unit tests.

 Tags: "AI for SE", "Prog Comprehension/Reeng/Maint"  
 
Deepika Tiwari, Martin Monperrus, Benoit Baudry, "Mimicking Production Behavior With Generated Mocks"

Abstract: Mocking allows testing program units in isolation. A developer who writes tests with mocks faces two challenges: design realistic interactions between a unit and its environment; and understand the expected impact of these interactions on the behavior of the unit. In this paper, we propose to monitor an application in production to generate tests that mimic realistic execution scenarios through mocks. Our approach operates in three phases. First, we instrument a set of target methods for which we want to generate tests, as well as the methods that they invoke, which we refer to as mockable method calls. Second, in production, we collect data about the context in which target methods are invoked, as well as the parameters and the returned value for each mockable method call. Third, offline, we analyze the production data to generate test cases with realistic inputs and mock interactions. The approach is automated and implemented in an open-source tool called rick . We evaluate our approach with three real-world, open-source Java applications. rick monitors the invocation of 128 methods in production across the three applications and captures their behavior. Based on this captured data, rick generates test cases that include realistic initial states and test inputs, as well as mocks and stubs. All the generated test cases are executable, and 52.4% of them successfully mimic the complete execution context of the target methods observed in production. The mock-based oracles are also effective at detecting regressions within the target methods, complementing each other in their fault-finding ability. We interview 5 developers from the industry who confirm the relevance of using production observations to design mocks and stubs. Our experimental findings clearly demonstrate the feasibility and added value of generating mocks from production interactions.

 Tags: "Testing and Quality", "Prog Comprehension/Reeng/Maint"  
 
Xueqi Yang, Mariusz Jakubowski, Li Kang, Haojie Yu, Tim Menzies, "SparseCoder: Advancing Source Code Analysis with Sparse Attention and Learned Token Pruning"

Abstract: As software projects rapidly evolve, software artifacts become more complex and defects behind get harder to identify. The emerging Transformerbased approaches, though achieving remarkable performance, struggle with long code sequences due to their self-attention mechanism, which scales quadratically with the sequence length. This paper introduces SparseCoder, an innovative approach incorporating sparse attention and learned token pruning (LTP) method (adapted from natural language processing) to address this limitation. Compared to previous state-of-the-art models (CodeBERT, RoBERTa and CodeT5), our experiments demonstrate that SparseCoder can handle significantly longer input sequences—at least twice as long, within the limits of our hardware resources and data statistics. Additionally, SparseCoder is four times faster than other methods measured in runtime, achieving a 50% reduction in floating point operations per second (FLOPs) with a negligible performance drop of less than 1% compared to Transformers using sparse attention (Sparse Atten). Plotting FLOPs of model inference against token lengths reveals that SparseCoder scales linearly, whereas other methods, including the current state-of-the-art model CodeT5, scale quadratically. Moreover, SparseCoder enhances interpretability by visualizing non-trivial tokens layer-wise.

 Tags: "AI for SE", "Analysis"  
 
Shandler Mason, Sandeep Kaur Kuttal, "Diversity's Double-Edged Sword: Analyzing Race's Effect on Remote Pair Programming Interactions"

Abstract: Remote pair programming is widely used in software development, but no research has examined how race affects these interactions between developers. We embarked on this study due to the historical under representation of Black developers in the tech industry, with White developers comprising the majority. Our study involved 24 experienced developers, forming 12 gender-balanced same- and mixed-race pairs. Pairs collaborated on a programming task using the think-aloud method, followed by individual retrospective interviews. Our findings revealed elevated productivity scores for mixed-race pairs, with no differences in code quality between same- and mixed-race pairs. Mixed-race pairs excelled in task distribution, shared decision-making, and role-exchange but encountered communication challenges, discomfort, and anxiety, shedding light on the complexity of diversity dynamics. Our study emphasizes race’s impact on remote pair programming and underscores the need for diverse tools and methods to address racial disparities for collaboration.

 Tags: "Process", "Human/Social"  
 
Yuxia Zhang, Zhiqing Qiu, Klaas-Jan Stol, Wenhui Zhu, Jiaxin Zhu, Yingchen Tian, Hui Liu, "Automatic Commit Message Generation: A Critical Review and Directions for Future Work"

Abstract: Commit messages are critical for code comprehension and software maintenance. Writing a high-quality message requires skill and effort. To support developers and reduce their effort on this task, several approaches have been proposed to automatically generate commit messages. Despite the promising performance reported, we have identified three significant and prevalent threats in these automated approaches: 1) the datasets used to train and evaluate these approaches contain a considerable amount of ‘noise’; 2) current approaches only consider commits of a limited diff size; and 3) current approaches can only generate the subject of a commit message, not the message body. The first limitation may let the models ‘learn’ inappropriate messages in the training stage, and also lead to inflated performance results in their evaluation. The other two threats can considerably weaken the practical usability of these approaches. Further, with the rapid emergence of large language models (LLMs) that show superior performance in many software engineering tasks, it is worth asking: can LLMs address the challenge of long diffs and whole message generation? This article first reports the results of an empirical study to assess the impact of these three threats on the performance of the state-of-the-art auto generators of commit messages. We collected commit data of the Top 1,000 most-starred Java projects in GitHub and systematically removed noisy commits with bot-submitted and meaningless messages. We then compared the performance of four approaches representative of the state-of-the-art before and after the removal of noisy messages, or with different lengths of commit diffs. We also conducted a qualitative survey with developers to investigate their perspectives on simply generating message subjects. Finally, we evaluate the performance of two representative LLMs, namely UniXcoder and ChatGPT, in generating more practical commit messages. The results demonstrate that generating commit messages is of great practical value, considerable work is needed to mature the current state-of-the-art, and LLMs can be an avenue worth trying to address the current limitations. Our analyses provide insights for future work to achieve better performance in practice.

 Tags: "Prog Comprehension/Reeng/Maint"  
 
Zimin Chen, Sen Fang, Martin Monperrus, "SUPERSONIC: Learning to Generate Source Code Optimizations in C/C++"

Abstract: Software optimization refines programs for resource efficiency while preserving functionality. Traditionally, it is a process done by developers and compilers. This paper introduces a third option, automated optimization at the source code level. We present Supersonic , a neural approach targeting minor source code modifications for optimization. Using a seq2seq model, Supersonic is trained on C/C++ program pairs ( xt , xt+1 ), where xt+1 is an optimized version of xt , and outputs a diff. Supersonic 's performance is benchmarked against OpenAI's GPT-3.5-Turbo and GPT-4 on competitive programming tasks. The experiments show that Supersonic not only outperforms both models on the code optimization task but also minimizes the extent of the change with a model more than 600x smaller than GPT-3.5-Turbo and 3700x smaller than GPT-4.

 Tags: "Analysis"  
 
Fernando Uyaguari, Silvia T. Acuña, John W. Castro, Davide Fucci, Oscar Dieste, Sira Vegas, "Relevant information in TDD experiment reporting"

Abstract: Experiments are a commonly used method of research in software engineering (SE). Researchers report their experiments following detailed guidelines. However, researchers do not, in the field of test-driven development (TDD) at least, specify how they operationalized the response variables and, particularly, the measurement process. This article has three aims: (i) identify the response variable operationalization components in TDD experiments that study external quality; (ii) study their influence on the experimental results; (iii) determine if the experiment reports describe the measurement process components that have an impact on the results. We used two-part sequential mixed methods research. The first part of the research adopts a quantitative approach applying a statistical analysis of the impact of the operationalization components on the experimental results. The second part follows on with a qualitative approach applying a systematic mapping study (SMS). The test suites, intervention types and measurers have an influence on the measurements and results of the statistical analysis of TDD experiments in SE. The test suites have a major impact on both the measurements and the results of the experiments. The intervention type has less impact on the results than on the measurements. While the measurers have an impact on the measurements, this is not transferred to the experimental results. On the other hand, the results of our SMS confirm that TDD experiments do not usually report either the test suites, the test case generation method, or the details of how external quality was measured. A measurement protocol should be used to assure that the measurements made by different measurers are similar. It is necessary to report the test cases, the experimental task and the intervention type in order to be able to reproduce the measurements and statistical analyses, as well as to replicate experiments and build dependable families of experiments.

 Tags: "Process"