An Empirical Comparison of Code Generation Approaches for Ansible (InteNSE 2024)

Who

Benjamin Darnell, Hetarth Chopra, Aaron Councilman, David Grove, Vikram S. Adve

Track

InteNSE 2024 InteNSE Workshop

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 15 Apr 2024 09:00 - 09:20 at Daciano da Costa - Early Morning Session Chair(s): Reyhaneh Jabbarvand, Saeid Tizpaz-Niari

Abstract

The rapid proliferation of LLM-based programming assistants has enabled fast and accurate automatic code generation for general-purpose programming languages. Domain-specific languages like Ansible, a DSL for IT Automation, have seen a lack of support despite being critical to many fields due to limited public-domain code for training models and a lack of interest from tool developers. To address this issue, we collect a novel dataset of permissively licensed Ansible code, and use it to create Warp, an LLM for code fine-tuned to produce Ansible tasks from a natural language prompt. We evaluate state-of-the-art tools for LLM-based code generation models, comparing multiple common strategies, including fine-tuning base models on Ansible code and retrieval-augmented-generation using documentation, in order to understand challenges with existing methodology and identify future research directions to enable better code generation for DSLs.

Benjamin Darnell

University of California, Santa Barbara

United States

Hetarth Chopra

University of Illinois at Urbana-Champaign

United States

Aaron Councilman

Univ of Illinois Urbana-Champaign

David Grove

IBM Research

United States

Vikram S. Adve

University of Illinois at Urbana-Champaign, USA

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 15 Apr
Displayed time zone: Lisbon change

09:00 - 10:30	Early Morning SessionInteNSE at Daciano da Costa Chair(s): Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign, Saeid Tizpaz-Niari University of Texas at El Paso

09:00 20m Paper		An Empirical Comparison of Code Generation Approaches for Ansible InteNSE Benjamin Darnell University of California, Santa Barbara, Hetarth Chopra University of Illinois at Urbana-Champaign, Aaron Councilman Univ of Illinois Urbana-Champaign, David Grove IBM Research, Vikram S. Adve University of Illinois at Urbana-Champaign, USA
09:20 70m Keynote		Towards an Interpretable Science of Deep Learning for Software Engineering: A Causal Inference View InteNSE Denys Poshyvanyk William & Mary