AI-SQE 2026 - AI-SQE: The 1st International Workshop on AI for Software Quality Evaluation: Judgment, Metrics, Benchmarks, and Beyond

Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil

About
Call for Papers

Overview

AI-SQE brings together leading researchers and practitioners to examine the latest developments, fundamental challenges, and future directions in AI-driven software evaluation. The workshop aspires to become the leading forum for exploring how LLMs and agentic systems can serve as dependable evaluators of software quality, performance, and correctness.

AI-SQE invites contributions addressing the theoretical foundations, empirical research, engineering methodologies, and tool development related to AI-based software quality evaluation. As the concept of “LLM-as-a-Judge” (LaaJ) gains traction within the AI community, this workshop offers a vital platform for comprehensive investigation into judgment models, benchmarking strategies, trust calibration, and the automation of evaluation processes.

Motivation and Objectives

AI-SQE addresses a timely shift in software engineering: the rise of AI, especially LLMs and agents, as evaluators of software quality. As traditional human-centric and tool-based methods are being challenged, this workshop explores AI’s emerging role in tasks like code review, testing, and quality assurance. Sitting at the intersection of AI, software engineering, and HCI, the workshop tackles critical issues such as trust, interpretability, and reproducibility, fostering progress in both academic research and industry practices.

Goals and Outcomes

The workshop aims to advance AI-driven software evaluation by developing reliable metrics, benchmarks, and tools for automated assessment. It will explore challenges like AI hallucination and evaluator alignment with human judgment through empirical studies. AI-SQE also seeks to build a collaborative research community focused on reproducibility and practical impact, ultimately shaping how AI is integrated into modern software quality workflows.

Target Audience

The AI-SQE workshop is intended for a diverse group of professionals and researchers who operate at the crossroads of software engineering and artificial intelligence. This includes those involved in software quality assurance, program analysis, automated testing, and the development of AI-based tools. It also welcomes participants exploring human-AI collaboration, empirical methods in software engineering, and tool benchmarking. The workshop aims to create a space where interdisciplinary experts can engage with one another to advance the integration of AI in ensuring software quality.

Mix of Industry and Research Participation

To bridge academic innovation and practical application, AI-SQE aims for a balanced mix of participants from both industry and academia. Key strategies include inviting speakers from major tech companies and AI tool providers, encouraging real-world case study submissions from industry, and disseminating calls for participation through both academic networks and professional communities. This mix will ensure that the workshop remains both theoretically rich and grounded in current practice, supporting dynamic, real-world-relevant discussions.

Call for Papers

Recent advancements in large language models (LLMs) and autonomous agents have introduced a major shift: the use of AI not only for generating software artifacts but also for evaluating them. This shift - from generation to judgment - has the potential to significantly reshape how software quality is assessed, how evaluation pipelines are designed, and how rigorous benchmarks are established. This workshop brings together researchers and practitioners to examine cutting-edge developments, core challenges, and future directions in AI-based software evaluation. It aims to be the leading forum for understanding how LLMs and agentic systems can serve as reliable judges of software quality, correctness, and performance. We invite contributions on theory, empirical studies, engineering practices, and tools. As the “LLM-as-a-Judge” (LaaJ) paradigm gains momentum, AI-SQE offers a timely venue for advancing models of judgment, evaluation metrics, benchmarking frameworks, and scalable automation

Topics of Interest

We welcome submissions across a broad spectrum of topics related to AI-driven software evaluation, including but not limited to:

LLMs as Evaluators in Software Engineering
Explorations of how LLMs can assess code quality, correctness, security, maintainability, and performance across software artifacts.
LLM-as-Judge (LaaJ): Foundations & Latest Techniques
Theoretical frameworks and practical implementations for treating LLMs as evaluators, including prompt engineering, voting schemes, and confidence calibration.
Agent as a Judge: Agentic Approaches for Evaluation
Use of autonomous or multi-agent systems to conduct evaluation tasks collaboratively, iteratively, or in multi-turn contexts.
Evaluating Code Agents and Code Agentic Systems
Benchmarks and evaluation methods for assessing the performance and reliability of AI agents that write, test, refactor, or debug software in autonomous or semi-autonomous settings.
Scalable Evaluation Pipelines for Software Systems
Design and implementation of automated pipelines for large-scale software quality evaluation using LLMs and hybrid human-AI approaches, including CI/CD integration and workflow automation.
Metrics and Benchmarks for Software Evaluation
Development of new metrics and standardized benchmark tasks tailored to AI-based evaluation of software quality, correctness, readability, and maintainability.
Trust, Reliability & Explainability in LLM-based Judgment
Techniques to assess and improve the trustworthiness, reproducibility, explainability, interpretability, and robustness of LLMs when used for software quality judgments, including methods for generating clear rationales and transparent decision processes.
Task-Specific Fine-Tuning for LLM-as-a-Judge
Techniques for fine-tuning or adapting LLMs for specialized evaluation tasks, including human-in-the-loop and RLHF approaches.
Generative AI for Software Quality Improvement
Approaches where generative models not only detect quality issues but also propose fixes, refactoring, test cases, or architectural improvements—closing the loop from evaluation to enhancement.
Real-World Applications and Case Studies
Practical deployments of AI-based evaluators in industry, open-source projects, or education.

Paper Submission and Review

All submitted papers should describe original work and not be published or under review anywhere. The papers must be written in English. All submissions will undergo single-blind peer review by at least three members of the program committee. Submissions will be evaluated based on workshop relevance, novelty and technical quality, clarity of presentation, and potential to stimulate discussion or lead to future research. Extended abstracts will be reviewed for relevance and soundness but will not be held to the same technical depth standards as full research papers. and will be judged by at least three reviewers on the basis of their clarity, relevance, originality, and contribution.

Types of Contributions and Page Limits:
AI-SQE will accept the following types of submissions:

Research Papers: Length: Up to 8 pages (excluding references). Description: Full-length papers presenting novel research results, comprehensive empirical studies, or significant engineering contributions related to AI-based software evaluation.
Extended Abstracts: Length: Up to 5 pages including references. Description: Concise submissions presenting new ideas, summaries of recent work, early insights, or position statements. Extended abstracts will be clearly marked as such in the proceedings.

The page limits include abstract, all figures, tables, and references. Extended abstracts will be published free of article processing charges (APCs) as per ACM policy, provided they are explicitly labeled as “extended abstracts” in the submission and proceedings.

All authors must use the official ACM Primary Article Template and submit their papers in PDF format through the HotCRP system. LaTeX authors must use \documentclass[sigconf,review]{acmart} in the preamble of the main file, allowing typesetting the paper in a double-column format with line numbers for easy reference by the reviewers.

Accepted papers will be published in the ACM Digital Library. The publication of the accepted papers will require the registration of at least one author in AI-SQE 2026, as well as the oral presentation of the paper during the workshop.

All questions about ACM template should be emailed to Onn Shehory (Proceedings Chair) - Onn.Shehory@biu.ac.il

Questions? Use the AI-SQE contact form.

AI-SQE: The 1st International Workshop on AI for Software Quality Evaluation: Judgment, Metrics, Benchmarks, and Beyond

Overview

Motivation and Objectives

Call for Papers

Onn Shehory

Bar Ilan University

Israel

Eitan Farchi

IBM Haifa Research Lab

Israel

Shmulik Froimovich

IBM Research

Jeremy Bradbury

Ontario Tech University

Canada

Achiya Elyasaf

Ben-Gurion University of the Negev

Israel

Oded Margalit

Ben-Gurion University of the Negev

Raghu Reddy

IIIT Hyderabad

India

Diptikalyan Saha

IBM Research India

India

Peter Santhanam

IBM Research AI

United States

Anna Sokol

University of Notre Dame

Gera Weiss

Ben-Gurion University of the Negev

Tracks

Co-hosted Conferences

Workshops