Recent progress in large language models (LLMs) has led to impressive code generation capabilities. However, existing evaluations of LLMs primarily focus on generating isolated, small-scale code units (e.g., single functions or statements) under default or unspecified software environments. As a result, it remains unclear whether LLMs can reliably generate executable code tailored to specific user environments.. To fill this knowledge gap, we make the first systematic study of Environment-Aware Code Generation (EACG), which requires generating code that is both functionally correct and directly executable under arbitrary software configurations. We introduce VersiBCB, a large-scale benchmark constructed from real-world Python projects, featuring diverse environment specifications and realistic, multi-package scenarios. Building on this benchmark, we develop three representative adaptation strategies for LLMs: retrieval-augmented generation (data-based), mixture-of-experts (parameter-based), and memory-augmented generation (cache-based). We empirically evaluate these methods across tasks including code completion, function repair, and API migration. Our results reveal that existing LLMs struggle with environment-specific code generation, but our adaptation strategies yield improvements in environment compatibility and executability. These findings highlight critical challenges and opportunities for deploying LLMs in practical, heterogeneous software engineering workflows, and underscore the need for further research into environment-aware AI-assisted programming.
| slides (Environment-Aware Code Generation (1).pdf) | 1016KiB |
Fri 17 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | AI for Software Engineering 22Research Track at Europa II Chair(s): Luca Di Grazia University of St. Gallen | ||
11:00 15mTalk | Environment-Aware Code Generation: How far are We? Research Track Tongtong Wu Monash University, Rongyi Chen Southeast University, Wenjie Du Southeast University, Suyu Ma CSIRO's Data61, Guilin Qi Southeast University, Zhenchang Xing CSIRO's Data61, Shahram Khadivi eBay Inc., Ramesh Periyathambi eBay Inc., Gholamreza Haffari Monash University File Attached | ||
11:15 15mTalk | LLM-based API Argument Completion with Knowledge-Augmented Prompts Research Track Waseem Akram Beijing Institute of Technology, Yanjie Jiang Tianjin University, Haris Ali Khan Beijing Institute of Technology, Furqan Jalil Beijing Institute of Technology, Hui Liu Beijing Institute of Technology | ||
11:30 15mTalk | Distance-Guided Search in Program Synthesis with Imperfect LLM Solutions Research Track | ||
11:45 15mTalk | Automatic Dockerfile Generation with Large Language Models Research Track Jun Lyu Nanjing University, He Zhang Nanjing University, Yusong Yuan Nanjing University, Lanxin Yang Nanjing University, Yue Li Nanjing University, Manuel Rigger National University of Singapore | ||
12:00 15mTalk | A Causal Perspective on Measuring, Explaining and Mitigating Smells in LLM-Generated Code Research Track Alejandro Velasco William & Mary, Daniel Rodriguez-Cardenas William & Mary, Dipin Khati William & Mary, David N. Palacio Microsoft, Lutfar Rahman Alif University of Dhaka, Denys Poshyvanyk William & Mary DOI Pre-print | ||
12:15 15mTalk | A Comparison of Conversational Models and Humans in Answering Technical Questions: the Firefox Case Research Track João Correia PUC-Rio, Daniel Coutinho Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Marco Castelluccio Mozilla, Caio Barbosa Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Igor Steinmacher RESHAPE LAB, Northern Arizona University, USA, Marco Gerosa Northern Arizona University, Alessandro Garcia Pontifical Catholic University of Rio de Janeiro, Rafael de Mello UFRJ, Brazil, Anita Sarma Oregon State University Pre-print | ||