On Mitigating Code LLM Hallucinations with API Documentation (ICSE 2025 - Software Engineering in Practice (SEIP)) - ICSE 2025

Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Who

Nihal Jain, Robert Kwiatkowski, Baishakhi Ray, Murali Krishna Ramanathan, Varun Kumar

Track

ICSE 2025 SE In Practice (SEIP)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Thu 1 May 2025 12:15 - 12:30 at 211 - AI for Design and Architecture Chair(s): Sarah Nadi

Abstract

In this study, we address the issue of API hallucinations in various software engineering contexts. We introduce CloudAPIBench, a new benchmark designed to measure API hallucination occurrences. CloudAPIBench also provides annotations for frequencies of API occurrences in the public domain, allowing us to study API hallucinations at various frequency levels. Our findings reveal that Code LLMs struggle with low frequency APIs: for e.g., GPT-4o achieves only $38.58$% valid low frequency API invocations. We demonstrate that Documentation Augmented Generation (DAG) significantly improves performance for low frequency APIs (increase to $47.94$% with DAG) but negatively impacts high frequency APIs when using sub-optimal retrievers (a $39.02$% absolute drop). To mitigate this, we propose to intelligently trigger DAG where we check against an API index or leverage Code LLMs’ confidence scores to retrieve only when needed. We demonstrate that our proposed methods enhance the balance between low and high frequency API performance, resulting in more reliable API invocations ($8.20$% absolute improvement on CloudAPIBench for GPT-4o).

Nihal Jain

Amazon Web Services

Robert Kwiatkowski

Baishakhi Ray

Columbia University

United States

Murali Krishna Ramanathan

AWS AI Labs

United States

Varun Kumar

AWS AI Labs

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Thu 1 May
Displayed time zone: Eastern Time (US & Canada) change

	11:00 - 12:30	AI for Design and ArchitectureDemonstrations / SE In Practice (SEIP) / Research Track at 211 Chair(s): Sarah Nadi New York University Abu Dhabi

	11:00 15m Talk		An LLM-Based Agent-Oriented Approach for Automated Code Design Issue Localization Research Track Fraol Batole Tulane University, David OBrien Iowa State University, Tien N. Nguyen University of Texas at Dallas, Robert Dyer University of Nebraska-Lincoln, Hridesh Rajan Tulane University
	11:15 15m Talk		Distilled Lifelong Self-Adaptation for Configurable Systems Research Track Yulong Ye University of Birmingham, Tao Chen University of Birmingham, Miqing Li University of Birmingham Pre-print
	11:30 15m Talk		The Software Librarian: Python Package Insights for Copilot Demonstrations Jasmine Latendresse Concordia University, Nawres Day ISSAT Sousse, SayedHassan Khatoonabadi Concordia University, Montreal, Emad Shihab Concordia University, Montreal
	11:45 15m Talk		aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing SE In Practice (SEIP) Siyuan Jiang , Jia Li Peking University, He Zong aiXcoder, Huanyu Liu Peking University, Hao Zhu Peking University, Shukai Hu aiXcoder, Erlu Li aiXcoder, Jiazheng Ding aiXcoder, Ge Li Peking University Pre-print
	12:00 15m Talk		Leveraging MLOps: Developing a Sequential Classification System for RFQ Documents in Electrical Engineering SE In Practice (SEIP) Claudio Martens Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Hammam Abdelwahab Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Katharina Beckh Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Birgit Kirsch Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Vishwani Gupta Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Dennis Wegener Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Steffen Hoh Schneider Electric
	12:15 15m Talk		On Mitigating Code LLM Hallucinations with API Documentation SE In Practice (SEIP) Nihal Jain Amazon Web Services, Robert Kwiatkowski , Baishakhi Ray Columbia University, Murali Krishna Ramanathan AWS AI Labs, Varun Kumar AWS AI Labs