PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization (EASE 2025 - AI Models / Data)

Who

Mohamed Salah Bouafif, Mohammad Hamdaqa, Edward Zulkoski

Track

EASE 2025 AI Models / Data

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 19 Jun 2025 13:55 - 14:10 at Senate Hall - LLMs for SE (Test) Chair(s): Ayse Tosun

Abstract

Mutation testing is a widely recognized technique for assessing and enhancing the effectiveness of software test suites by introducing deliberate code mutations. However, its application often results in overly large test suites, as developers generate numerous tests to kill specific mutants, increasing computational overhead. This paper introduces PRIMG (Prioritization and Refinement Integrated Mutation-driven Generation), a novel framework for incremental and adaptive test case generation for Solidity smart contracts. PRIMG integrates two core components: a mutation prioritization module, which employs a machine learning model trained on mutant subsumption graphs to predict the usefulness of surviving mutants, and a test case generation module, which utilizes Large Language Models (LLMs) to generate and iteratively refine test cases to achieve syntactic and behavioral correctness.

We evaluated PRIMG on real-world Solidity projects from Code4Arena to assess its effectiveness in improving mutation scores and generating high-quality test cases. The experimental results demonstrate that PRIMG significantly reduces test suite size while maintaining high mutation coverage. The prioritization module consistently outperformed random mutant selection, enabling the generation of high-impact tests with reduced computational effort. Furthermore, the refining process enhanced the correctness and utility of LLM-generated tests, addressing their inherent limitations in handling edge cases and complex program logic.

Link to Preprint

http://arxiv.org/abs/2505.05584

Mohamed Salah Bouafif

Polytechnique Montréal

Canada

Mohammad Hamdaqa

Polytechnique Montreal

Canada

Edward Zulkoski

Quantstamp

Canada

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 19 Jun
Displayed time zone: Athens change

13:30 - 15:00	LLMs for SE (Test)Research Papers / Short Papers, Emerging Results / Industry Papers / AI Models / Data at Senate Hall Chair(s): Ayse Tosun Istanbul Technical University

13:30 15m Talk		Can We Enhance Bug Report Quality Using LLMs?: An Empirical Study of LLM-Based Bug Report Generation AI Models / Data Jagrit Acharya University of Calgary, Gouri Ginde (Deshpande) University of Calgary
13:45 10m Talk		LELANTE: LEveraging LLM for Automated ANdroid TEsting Short Papers, Emerging Results Shamit Fatin Bangladesh University of Engineering and Technology (BUET), Mehbubul Hasan Al-Quvi Bangladesh University of Engineering and Technology (BUET), Haz Sameen Shahgir University of California, Riverside, Sukarna Barua Bangladesh University of Engineering and Technology (BUET), Anindya Iqbal Bangladesh University of Engineering and Technology Dhaka, Bangladesh, Sadia Sharmin Bangladesh University of Engineering and Technology (BUET), Md. Mostofa Akbar Bangladesh University of Engineering and Technology (BUET), Kallol Kumar Pal Samsung R&D Institute Bangladesh (SRBD), A. Asif Al Rashid Samsung R&D Institute Bangladesh (SRBD) Pre-print
13:55 15m Talk		PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization AI Models / Data Mohamed Salah Bouafif Polytechnique Montréal, Mohammad Hamdaqa Polytechnique Montreal, Edward Zulkoski Quantstamp Pre-print
14:10 15m Talk		Quality Assessment of Python Tests Generated by Large Language Models Research Papers Victor Alves Federal University of Ceará (UFC), Carla Bezerra Federal University of Ceará (UFC), Ivan Machado Federal University of Bahia - UFBA, Larissa Rocha University of the State of Bahia (UNEB), Tássio Virgínio Federal University of Bahia (UFBA), Publio Silva Federal University of Ceará (UFC)
14:25 10m Talk		Test Code Generation at Ericsson using Program Analysis Augmented Fine Tuned LLMs Industry Papers Sai Krishna Bala Ericsson, Balvinder Singh Ericsson, Sujoy Roychowdhury Ericsson, Giriprasad Sridhara Ericsson, Sourav Mazumdar Ericsson, Magnus Sandelin Ericsson, Dimitris Rentas Ericsson, Maciej Nalepa Ericsson, Karol Sawicki Ericsson, Jakub Gajda Ericsson
14:35 15m Talk		Tracking the Moving Target: A Framework for Continuous Evaluation of LLM Test Generation in Industry Research Papers Maider Azanza University of the Basque Country (UPV/EHU), Beatriz Pérez Lamancha LKS Next, Eneko Pizarro University of the Basque Country (UPV/EHU) Pre-print
14:50 10m Talk		Leveraging LLMs for Automated Translation of Legacy Code: A Case Study on PL/SQL to Java Transformation Industry Papers Lola Solovyeva University of Twente, Eduardo Carneiro Oliveira Utrecht University, Shiyu Fan Eindhoven University of Technology, Alper Tuncay Leiden University, Shamil Gareev Eötvös Loránd University, Andrea Capiluppi University of Groningen