ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil

Overview

Recent advancements in large language models (LLMs) and autonomous agents have introduced a major shift: the use of AI not only for generating software artifacts but also for evaluating them. This shift - from generation to judgment - has the potential to significantly reshape how software quality is assessed, how evaluation pipelines are designed, and how rigorous benchmarks are established.

AI-SQE brings together leading researchers and practitioners to examine the latest developments, fundamental challenges, and future directions in AI-driven software evaluation. The workshop aspires to become the leading forum for exploring how LLMs and agentic systems can serve as dependable evaluators of software quality, performance, and correctness.

AI-SQE invites contributions addressing the theoretical foundations, empirical research, engineering methodologies, and tool development related to AI-based software quality evaluation. As the concept of “LLM-as-a-Judge” (LaaJ) gains traction within the AI community, this workshop offers a vital platform for comprehensive investigation into judgment models, benchmarking strategies, trust calibration, and the automation of evaluation processes.


Motivation and Objectives

AI-SQE addresses a timely shift in software engineering: the rise of AI, especially LLMs and agents, as evaluators of software quality. As traditional human-centric and tool-based methods are being challenged, this workshop explores AI’s emerging role in tasks like code review, testing, and quality assurance. Sitting at the intersection of AI, software engineering, and HCI, the workshop tackles critical issues such as trust, interpretability, and reproducibility, fostering progress in both academic research and industry practices.

Goals and Outcomes

The workshop aims to advance AI-driven software evaluation by developing reliable metrics, benchmarks, and tools for automated assessment. It will explore challenges like AI hallucination and evaluator alignment with human judgment through empirical studies. AI-SQE also seeks to build a collaborative research community focused on reproducibility and practical impact, ultimately shaping how AI is integrated into modern software quality workflows.

Target Audience

The AI-SQE workshop is intended for a diverse group of professionals and researchers who operate at the crossroads of software engineering and artificial intelligence. This includes those involved in software quality assurance, program analysis, automated testing, and the development of AI-based tools. It also welcomes participants exploring human-AI collaboration, empirical methods in software engineering, and tool benchmarking. The workshop aims to create a space where interdisciplinary experts can engage with one another to advance the integration of AI in ensuring software quality.

Mix of Industry and Research Participation

To bridge academic innovation and practical application, AI-SQE aims for a balanced mix of participants from both industry and academia. Key strategies include inviting speakers from major tech companies and AI tool providers, encouraging real-world case study submissions from industry, and disseminating calls for participation through both academic networks and professional communities. This mix will ensure that the workshop remains both theoretically rich and grounded in current practice, supporting dynamic, real-world-relevant discussions.

Plenary
You're viewing the program in a time zone which is different from your device's time zone change time zone

Sun 12 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

08:00 - 17:30
Sunday RegistrationSocial, Networking and Special Rooms at Main Entrance

Registration for ICSE 2026.

08:00
9h30m
Registration
ICSE 2026 Registration
Social, Networking and Special Rooms

09:00 - 10:30
Session 1 / FinanSE at Bora Bora I
09:00
5m
Day opening
FinanSE Opening
FinanSE
Sallam Abualhaija University of Luxembourg, Domenico Bianculli University of Luxembourg, Eileen Kapel Delft University of Technology
09:05
20m
Paper
Separation of Concerns for Privacy-Preserving LLM Adoption: A Banking Architecture Framework
FinanSE
Yu Kong University College London (UCL), Silvia Bartolucci University College London, Fabio Caccioli University College London (UCL), Giuseppe Destefanis University College London
09:25
20m
Paper
LLM-Assisted Retro-Documentation for Legacy COBOL Applications in BankingVirtual Attendance
FinanSE
Mahdi LATRECHE , Elouan MARSOT BNP Paribas, Emma Asma DAUMAS BNP Paribas, Azzedine Idir AIT SAID Télécom Paris, Damien DROISY BNP Paribas, Mariam Barry BNP Paribas
09:45
20m
Paper
Operationalising DAO Sustainability KPIs: A Multi-Chain Dashboard for Governance Analytics
FinanSE
Silvio Meneguzzo University of Turin, Claudio Schifanella University of Turin, Valentina Gatteschi Politecnico di Torino, Giuseppe Destefanis University College London
10:05
15m
Short-paper
Model Extraction and Explanation of Review Decisions: A Case Study on Cloud Migration PlanningVirtual Attendance
FinanSE
Vali Tawosi J.P. Morgan AI Research, Salwa Alamir J.P. Morgan AI Research, Xiaomo Liu J.P. Morgan AI Research
Media Attached
10:20
10m
Other
Discussion
FinanSE

10:30 - 11:00
Sunday Morning BreakCatering at Catering and Exhibition Hall (Europa I to IV)

This break will provide an opportunity for networking and relaxation between sessions.

10:30
30m
Coffee break
Break
Catering

11:00 - 12:30
Session 2 / FinanSE at Bora Bora I
11:00
5m
Other
Session Opening
FinanSE

11:05
5m
Talk
Reinforcement Learning in Simulated Environments for Adaptive Troubleshooting of Large-Scale Banking Systems
FinanSE
Azzedine Idir AIT SAID Télécom Paris, Mariam Barry BNP Paribas, Albert Bifet University of Waikato, Institut Polytechnique de Paris
11:10
15m
Other
Panel Discussion
FinanSE

11:25
5m
Day closing
FinanSE Closing
FinanSE

12:30 - 14:00
Sunday LunchCatering at Catering and Exhibition Hall (Europa I to IV)

Lunch time with a variety of meal options available for attendees, including vegetarian choices. This session will provide an opportunity for attendees to enjoy a meal while networking with colleagues and discussing the day’s events.

12:30
90m
Lunch
Lunch
Catering

15:30 - 16:00
Sunday Afternoon BreakCatering at Catering and Exhibition Hall (Europa I to IV)

Afternoon Break with a variety of beverages and snacks available for attendees. This break will provide an opportunity for networking and relaxation between sessions.

15:30
30m
Coffee break
Break
Catering

Accepted Papers

Title
Beyond Public Benchmarks: An LLM-as-Judge Framework for Enterprise Code Evaluation with Margin-Driven OptimizationVirtual Attendance
AI-SQE
Evaluating perturbation robustness of generative systems that use COBOL code inputs
AI-SQE
How Good is ChatGPT in Assessing Architecture Diagrams? An Exploratory Study with Four Software Engineering Tools
AI-SQE
Multicalibration for LLM-based Code Generation
AI-SQE
Multilingual Code Evaluation with LLM-as-a-Judge: AI-Assisted Feedback for Human-Centric UnderstandingVirtual Attendance
AI-SQE
On the Quality of AI-Generated Source Code Comments: A Comprehensive Evaluation
AI-SQE
PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code
AI-SQE
Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models
AI-SQE
Towards a Human-in-the-Loop Framework for Reliable Patch Evaluation Using an LLM-as-a-Judge
AI-SQE

Call for Papers

Recent advancements in large language models (LLMs) and autonomous agents have introduced a major shift: the use of AI not only for generating software artifacts but also for evaluating them. This shift - from generation to judgment - has the potential to significantly reshape how software quality is assessed, how evaluation pipelines are designed, and how rigorous benchmarks are established. This workshop brings together researchers and practitioners to examine cutting-edge developments, core challenges, and future directions in AI-based software evaluation. It aims to be the leading forum for understanding how LLMs and agentic systems can serve as reliable judges of software quality, correctness, and performance. We invite contributions on theory, empirical studies, engineering practices, and tools. As the “LLM-as-a-Judge” (LaaJ) paradigm gains momentum, AI-SQE offers a timely venue for advancing models of judgment, evaluation metrics, benchmarking frameworks, and scalable automation


Topics of Interest

We welcome submissions across a broad spectrum of topics related to AI-driven software evaluation, including but not limited to:

  • LLMs as Evaluators in Software Engineering
    Explorations of how LLMs can assess code quality, correctness, security, maintainability, and performance across software artifacts.

  • LLM-as-Judge (LaaJ): Foundations & Latest Techniques
    Theoretical frameworks and practical implementations for treating LLMs as evaluators, including prompt engineering, voting schemes, and confidence calibration.

  • Agent as a Judge: Agentic Approaches for Evaluation
    Use of autonomous or multi-agent systems to conduct evaluation tasks collaboratively, iteratively, or in multi-turn contexts.

  • Evaluating Code Agents and Code Agentic Systems
    Benchmarks and evaluation methods for assessing the performance and reliability of AI agents that write, test, refactor, or debug software in autonomous or semi-autonomous settings.

  • Scalable Evaluation Pipelines for Software Systems
    Design and implementation of automated pipelines for large-scale software quality evaluation using LLMs and hybrid human-AI approaches, including CI/CD integration and workflow automation.

  • Metrics and Benchmarks for Software Evaluation
    Development of new metrics and standardized benchmark tasks tailored to AI-based evaluation of software quality, correctness, readability, and maintainability.

  • Trust, Reliability & Explainability in LLM-based Judgment
    Techniques to assess and improve the trustworthiness, reproducibility, explainability, interpretability, and robustness of LLMs when used for software quality judgments, including methods for generating clear rationales and transparent decision processes.

  • Task-Specific Fine-Tuning for LLM-as-a-Judge
    Techniques for fine-tuning or adapting LLMs for specialized evaluation tasks, including human-in-the-loop and RLHF approaches.

  • Generative AI for Software Quality Improvement
    Approaches where generative models not only detect quality issues but also propose fixes, refactoring, test cases, or architectural improvements—closing the loop from evaluation to enhancement.

  • Real-World Applications and Case Studies
    Practical deployments of AI-based evaluators in industry, open-source projects, or education.


Paper Submission and Review

All submitted papers should describe original work and not be published or under review anywhere. The papers must be written in English. All submissions will undergo single-blind peer review by at least three members of the program committee. Submissions will be evaluated based on workshop relevance, novelty and technical quality, clarity of presentation, and potential to stimulate discussion or lead to future research. Extended abstracts will be reviewed for relevance and soundness but will not be held to the same technical depth standards as full research papers. and will be judged by at least three reviewers on the basis of their clarity, relevance, originality, and contribution.

Types of Contributions and Page Limits:
AI-SQE will accept the following types of submissions:

  • Research Papers: Length: Up to 8 pages (excluding references). Description: Full-length papers presenting novel research results, comprehensive empirical studies, or significant engineering contributions related to AI-based software evaluation.

  • Extended Abstracts: Length: Up to 5 pages including references. Description: Concise submissions presenting new ideas, summaries of recent work, early insights, or position statements. Extended abstracts will be clearly marked as such in the proceedings.

The page limits include abstract, all figures, tables, and references. Extended abstracts will be published free of article processing charges (APCs) as per ACM policy, provided they are explicitly labeled as “extended abstracts” in the submission and proceedings.

All authors must use the official ACM Primary Article Template and submit their papers in PDF format through the HotCRP system. LaTeX authors must use \documentclass[sigconf,review]{acmart} in the preamble of the main file, allowing typesetting the paper in a double-column format with line numbers for easy reference by the reviewers.

Accepted papers will be published in the ACM Digital Library. The publication of the accepted papers will require the registration of at least one author in AI-SQE 2026, as well as the oral presentation of the paper during the workshop.

All questions about ACM template should be emailed to Onn Shehory (Proceedings Chair) - Onn.Shehory@biu.ac.il

Questions? Use the AI-SQE contact form.