SANER 2026
Tue 17 - Fri 20 March 2026 Limassol, Cyprus

Foundation Model (FM)-powered coding assistants, such as GitHub Copilot, ChatGPT, Claude, and Gemini, have disrupted the software development landscape, becoming essential interfaces between developers and code generation. While the efficacy of these Large Language Models (LLMs) relies heavily on the quality of natural language prompts, existing research has predominantly focused on prompt engineering techniques rather than the linguistic proficiency of the user.

This paper addresses the natural language proficiency, which is a critical, underexplored factor. A mismatch between the complexity of a prompt and the resulting code can create significant friction in the software engineering lifecycle. For instance, a developer with high linguistic proficiency (CEFR C1) but novice programming skills (A1) may trigger an LLM to generate highly idiomatic, complex code that they cannot maintain or debug. In contrast, simple prompts might yield inefficient solutions for expert users. Furthermore, as diverse teams adopt these tools, ensuring that AI-generated code aligns with the team’s specific proficiency is crucial for preventing subtle bugs and maintaining development velocity.

To investigate this, we conducted an empirical study using the HumanEval dataset (164 hand-written Python problem-solving tasks) and the HumanEvalPlus test suite. We utilized three state-of-the-art models, including GPT-4o, Gemini 2.5 Pro, and Claude Sonnet 4. Our methodology involved two primary alignment standards. First, the English proficiency level of problem descriptions based on the Common European Framework of Reference for Languages (CEFR), ranging from A1 (Beginner) to C2 (Proficient). Second, the code proficiency adapted the pycefr standard, which rates Python elements by conceptual difficulty (e.g., print() is A1, while zip() and map() are C2). Then, we structured our investigation around two research questions. RQ1: What is the baseline natural language proficiency of software engineering problem descriptions generated by LLMs? We investigated the default linguistic level LLMs employ when describing technical tasks to establish a baseline for comprehension requirements.RQ2: Does the natural language proficiency of the prompt influence the proficiency and correctness of the generated code? We systematically varied the prompt proficiency into different levels, then observed causal links between the user’s language input and the model’s code output.

Our analysis yields several key insights regarding the relationship between natural language and code generation. LLMs default to an Intermediate (B2) or higher natural language level when describing software engineering problems. This suggests that a baseline level of English proficiency is effectively a prerequisite for developers to fully comprehend standard AI-generated explanations. On another hand, the impact of prompt language on code proficiency varied between models. However, the impact on correctness was consistent. Higher-proficiency prompts (C1/C2) consistently yielded code with higher correctness rates across all models. Conversely, simplifying the natural language of the prompt to lower CEFR levels resulted in a measurable decrease in code correctness. This study demonstrates that natural language proficiency is not only a user characteristic but also a functional control lever for code generation. The results highlight potential issues. For example, simplifying language to accommodate non-native speakers or novices may degrade the reliability of AI-generated solutions. These findings suggest that future FM-powered tools must explicitly account for user language proficiency to tailor solutions that are correct and comprehensible.

Fri 20 Mar

Displayed time zone: Athens change

11:00 - 12:30
Session 6A - Tools and Techniques for Effective Software DevelopmentIndustrial Track / Journal First Track / Tool Demo Track / Research Track at Panorama
Chair(s): NIKIEMA Beninwende Serge Lionel University of Luxembourg
11:00
15m
Talk
How Natural Language Proficiency Shapes GenAI Code for Software Engineering Tasks
Journal First Track
Ruksit Rojpaisarnkit Nara Institute of Science and Technology, Youmei Fan Nara Institute of Science and Technology, Kenichi Matsumoto Nara Institute of Science and Technology, Raula Gaikovina Kula The University of Osaka
11:15
15m
Talk
Data Catalog Tools: A Systematic Multivocal Literature Review
Journal First Track
Marco Tonnarelli JADS - TU/e, Indika Kumara Tilburg University, Stefan Driessen JADS, Tilburg University, Damian Andrew Tamburri University of Sannio - JADS/NXP Semiconductors, Willem-Jan van den Heuvel JADS, Tilburg University, Patrick Oor NXP Semiconductors
11:30
15m
Talk
On the Practical Adoption of a Static Performance Anti-Pattern Detector: An Industrial Case Study
Industrial Track
Lizhi Liao University of Guelph, Weiyi Shang University of Waterloo, Catalin Sporea ERA Environmental Management Solutions, Andrei Toma ERA Environmental Management Solutions, Sarah Sajedi ERA Environmental Management Solutions
11:45
15m
Talk
Multi-CoLoR: Context-Aware Localization and Reasoning across Multi-Language Codebases
Industrial Track
Indira Vats University of Toronto; Advanced Micro Devices (AMD), Sanjukta De Advanced Micro Devices, Subhayan Roy , Saurabh Bodhe , Lejin Varghese , Max Kiehn , Yonas Bedasso Advanced Micro Devices, Marsha Chechik University of Toronto
Pre-print
12:00
15m
Talk
Diagram-Aware Automatic Review of Software Design Documents Using Multimodal Large Language Models
Industrial Track
Takasaburo Fukuda Fujitsu Limited, Susumu Tokumoto Fujitsu Limited
12:15
7m
Talk
Source Code-Driven GDPR Documentation: Supporting RoPA with Assessor View
Tool Demo Track
Mugdha Khedkar Heinz Nixdorf Institute, Paderborn University, Michael Schlichtig Heinz Nixdorf Institut, Paderborn University, Eric Bodden Heinz Nixdorf Institute at Paderborn University & Fraunhofer IEM
Pre-print Media Attached
12:22
7m
Talk
RefineID: A Developer-Centric IDE Assistant for Better Identifiers
Tool Demo Track
Eya Jeljli , Tushar Sharma Dalhousie University