Using Large Language Models to Generate JUnit Tests: An Empirical Study (EASE 2024 - Research Papers)

Who

Mohammed Latif Siddiq, Joanna C. S. Santos, Ridwanul Hasan Tanvir, Noshin Ulfat, Fahmid Al Rifat, Vinicius Carvalho Lopes

Track

EASE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 19 Jun 2024 14:00 - 14:13 at Room Vietri - Testing Chair(s): Samira Silva

Abstract

A code generation model generates code by taking a prompt from a code comment, existing code, or a combination of both. Although code generation models (e.g., GitHub Copilot) are increasingly being adopted in practice, it is unclear whether they can successfully be used for unit test generation without fine-tuning for a strongly typed language like Java. To fill this gap, we investigated how well three models (Codex, GPT-3.5-Turbo, and StarCoder) can generate unit tests. We used two benchmarks (HumanEval and Evosuite SF110) to investigate the effect of context generation on the unit test generation process. We evaluated the models based on compilation rates, test correctness, test coverage, and test smells. We found that the Codex model achieved above 80% coverage for the HumanEval dataset, but no model had more than 2% coverage for the EvoSuite SF110 benchmark. The generated tests also suffered from test smells, such as Duplicated Asserts and Empty Tests.

Link to Preprint

https://s2e-lab.github.io/preprints/ease24-preprint.pdf

Mohammed Latif Siddiq

University of Notre Dame

United States

Joanna C. S. Santos

University of Notre Dame

United States

Ridwanul Hasan Tanvir

Pennsylvania State University

United States

Noshin Ulfat

IQVIA Inc.

Bangladesh

Fahmid Al Rifat

United International University

Bangladesh

Vinicius Carvalho Lopes

University of Notre Dame

United States

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 19 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:20	TestingResearch Papers / Short Papers, Vision and Emerging Results at Room Vietri Chair(s): Samira Silva Gran Sasso Science Institute (GSSI)

14:00 13m Talk		Using Large Language Models to Generate JUnit Tests: An Empirical Study Research Papers Mohammed Latif Siddiq University of Notre Dame, Joanna C. S. Santos University of Notre Dame, Ridwanul Hasan Tanvir Pennsylvania State University, Noshin Ulfat IQVIA Inc., Fahmid Al Rifat United International University, Vinicius Carvalho Lopes University of Notre Dame Pre-print
14:13 13m Talk		Mutation Testing for Task-Oriented Chatbots Research Papers Pablo Gómez-Abajo Universidad Autónoma de Madrid, Sara Perez-Soler Universidad Autónoma de Madrid, Pablo C Canizares Autonomous University of Madrid, Spain, Esther Guerra Universidad Autónoma de Madrid, Juan de Lara Autonomous University of Madrid Pre-print
14:26 13m Talk		A Catalog of Transformations to Remove Test Smells From Natural Language TestsDistinguished Paper Award Research Papers Manoel Aranda III Federal University of Alagoas, Naelson Oliveira Federal University of Alagoas, Elvys Soares Federal Institute of Alagoas (IFAL), Márcio Ribeiro Federal University of Alagoas, Brazil, Davi Romão Federal University of Alagoas, Ullyanne Patriota Federal University of Alagoas, Rohit Gheyi Federal University of Campina Grande, Emerson Paulo Soares de Souza Federal University of Pernambuco, Ivan Machado Federal University of Bahia Pre-print
14:40 13m Talk		An Empirical Study on Code Coverage of Performance Testing Research Papers Muhammad Imran Università degli Studi dell'Aquila, Vittorio Cortellessa University of L'Aquila, Davide Di Ruscio University of L'Aquila, Riccardo Rubei University of L'Aquila, Luca Traini University of L'Aquila Link to publication DOI
14:53 13m Talk		AI-Generated Test Scripts for Web E2E Testing with ChatGPT and Copilot: A preliminary study Short Papers, Vision and Emerging Results Maurizio Leotta DIBRIS, University of Genova, Italy, Hafiz Zeeshan Yousaf Università di Genova, Filippo Ricca Università di Genova, Boni Garcia Universidad Carlos III de Madrid
15:06 13m Talk		Towards Predicting Fragility in End-to-End Web Tests Short Papers, Vision and Emerging Results Sergio Di Meglio Università degli Studi di Napoli Federico II, Luigi Libero Lucio Starace Università degli Studi di Napoli Federico II