ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil

Overview

Recent advancements in large language models (LLMs) and autonomous agents have introduced a major shift: the use of AI not only for generating software artifacts but also for evaluating them. This shift - from generation to judgment - has the potential to significantly reshape how software quality is assessed, how evaluation pipelines are designed, and how rigorous benchmarks are established.

AI-SQE brings together leading researchers and practitioners to examine the latest developments, fundamental challenges, and future directions in AI-driven software evaluation. The workshop aspires to become the leading forum for exploring how LLMs and agentic systems can serve as dependable evaluators of software quality, performance, and correctness.

AI-SQE invites contributions addressing the theoretical foundations, empirical research, engineering methodologies, and tool development related to AI-based software quality evaluation. As the concept of “LLM-as-a-Judge” (LaaJ) gains traction within the AI community, this workshop offers a vital platform for comprehensive investigation into judgment models, benchmarking strategies, trust calibration, and the automation of evaluation processes.


Motivation and Objectives

AI-SQE addresses a timely shift in software engineering: the rise of AI, especially LLMs and agents, as evaluators of software quality. As traditional human-centric and tool-based methods are being challenged, this workshop explores AI’s emerging role in tasks like code review, testing, and quality assurance. Sitting at the intersection of AI, software engineering, and HCI, the workshop tackles critical issues such as trust, interpretability, and reproducibility, fostering progress in both academic research and industry practices.

Goals and Outcomes

The workshop aims to advance AI-driven software evaluation by developing reliable metrics, benchmarks, and tools for automated assessment. It will explore challenges like AI hallucination and evaluator alignment with human judgment through empirical studies. AI-SQE also seeks to build a collaborative research community focused on reproducibility and practical impact, ultimately shaping how AI is integrated into modern software quality workflows.

Target Audience

The AI-SQE workshop is intended for a diverse group of professionals and researchers who operate at the crossroads of software engineering and artificial intelligence. This includes those involved in software quality assurance, program analysis, automated testing, and the development of AI-based tools. It also welcomes participants exploring human-AI collaboration, empirical methods in software engineering, and tool benchmarking. The workshop aims to create a space where interdisciplinary experts can engage with one another to advance the integration of AI in ensuring software quality.

Mix of Industry and Research Participation

To bridge academic innovation and practical application, AI-SQE aims for a balanced mix of participants from both industry and academia. Key strategies include inviting speakers from major tech companies and AI tool providers, encouraging real-world case study submissions from industry, and disseminating calls for participation through both academic networks and professional communities. This mix will ensure that the workshop remains both theoretically rich and grounded in current practice, supporting dynamic, real-world-relevant discussions.

Call for Papers

Recent advancements in large language models (LLMs) and autonomous agents have introduced a major shift: the use of AI not only for generating software artifacts but also for evaluating them. This shift - from generation to judgment - has the potential to significantly reshape how software quality is assessed, how evaluation pipelines are designed, and how rigorous benchmarks are established. This workshop brings together researchers and practitioners to examine cutting-edge developments, core challenges, and future directions in AI-based software evaluation. It aims to be the leading forum for understanding how LLMs and agentic systems can serve as reliable judges of software quality, correctness, and performance. We invite contributions on theory, empirical studies, engineering practices, and tools. As the “LLM-as-a-Judge” (LaaJ) paradigm gains momentum, AI-SQE offers a timely venue for advancing models of judgment, evaluation metrics, benchmarking frameworks, and scalable automation


Topics of Interest

We welcome submissions across a broad spectrum of topics related to AI-driven software evaluation, including but not limited to:

  • LLMs as Evaluators in Software Engineering
    Explorations of how LLMs can assess code quality, correctness, security, maintainability, and performance across software artifacts.

  • LLM-as-Judge (LaaJ): Foundations & Latest Techniques
    Theoretical frameworks and practical implementations for treating LLMs as evaluators, including prompt engineering, voting schemes, and confidence calibration.

  • Agent as a Judge: Agentic Approaches for Evaluation
    Use of autonomous or multi-agent systems to conduct evaluation tasks collaboratively, iteratively, or in multi-turn contexts.

  • Evaluating Code Agents and Code Agentic Systems
    Benchmarks and evaluation methods for assessing the performance and reliability of AI agents that write, test, refactor, or debug software in autonomous or semi-autonomous settings.

  • Scalable Evaluation Pipelines for Software Systems
    Design and implementation of automated pipelines for large-scale software quality evaluation using LLMs and hybrid human-AI approaches, including CI/CD integration and workflow automation.

  • Metrics and Benchmarks for Software Evaluation
    Development of new metrics and standardized benchmark tasks tailored to AI-based evaluation of software quality, correctness, readability, and maintainability.

  • Trust, Reliability & Explainability in LLM-based Judgment
    Techniques to assess and improve the trustworthiness, reproducibility, explainability, interpretability, and robustness of LLMs when used for software quality judgments, including methods for generating clear rationales and transparent decision processes.

  • Task-Specific Fine-Tuning for LLM-as-a-Judge
    Techniques for fine-tuning or adapting LLMs for specialized evaluation tasks, including human-in-the-loop and RLHF approaches.

  • Generative AI for Software Quality Improvement
    Approaches where generative models not only detect quality issues but also propose fixes, refactoring, test cases, or architectural improvements—closing the loop from evaluation to enhancement.

  • Real-World Applications and Case Studies
    Practical deployments of AI-based evaluators in industry, open-source projects, or education.


Paper Submission and Review

All submitted papers should describe original work and not be published or under review anywhere. The papers must be written in English. All submissions will undergo single-blind peer review by at least three members of the program committee. Submissions will be evaluated based on workshop relevance, novelty and technical quality, clarity of presentation, and potential to stimulate discussion or lead to future research. Extended abstracts will be reviewed for relevance and soundness but will not be held to the same technical depth standards as full research papers. and will be judged by at least three reviewers on the basis of their clarity, relevance, originality, and contribution.

Types of Contributions and Page Limits:
AI-SQE will accept the following types of submissions:

  • Research Papers: Length: Up to 8 pages (excluding references). Description: Full-length papers presenting novel research results, comprehensive empirical studies, or significant engineering contributions related to AI-based software evaluation.

  • Extended Abstracts: Length: Up to 5 pages including references. Description: Concise submissions presenting new ideas, summaries of recent work, early insights, or position statements. Extended abstracts will be clearly marked as such in the proceedings.

The page limits include abstract, all figures, tables, and references. Extended abstracts will be published free of article processing charges (APCs) as per ACM policy, provided they are explicitly labeled as “extended abstracts” in the submission and proceedings.

All authors must use the official ACM Primary Article Template and submit their papers in PDF format through the HotCRP system. LaTeX authors must use \documentclass[sigconf,review]{acmart} in the preamble of the main file, allowing typesetting the paper in a double-column format with line numbers for easy reference by the reviewers.

Accepted papers will be published in the ACM Digital Library. The publication of the accepted papers will require the registration of at least one author in AI-SQE 2026, as well as the oral presentation of the paper during the workshop.

All questions about ACM template should be emailed to Onn Shehory (Proceedings Chair) - Onn.Shehory@biu.ac.il

Questions? Use the AI-SQE contact form.