ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Large Language Models (LLMs) have demonstrated their exceptional performance in various complex code generation tasks. However, their broader adoption is limited by significant computational demands and high resource requirements, particularly memory and processing power. To mitigate such requirements, model pruning techniques are used to create more compact models with significantly fewer parameters. However, current approaches do not focus on the efficient extraction of programming-language-specific sub-models. In this work, we explore the idea of efficiently deriving coding-specific sub-models through unstructured pruning (i.e., Wanda). We investigate the impact of different domain-specific calibration datasets on pruning outcomes across three distinct domains and extend our analysis to extracting four language-specific sub-models: Python, Java, C++, and JavaScript. We are the first to efficiently extract programming-language-specific sub-models using appropriate calibration datasets while maintaining acceptable accuracy w.r.t. full models. We are also the first to provide analytical evidence that domain-specific tasks activate distinct regions within LLMs, supporting the creation of specialized sub-models through unstructured pruning. We believe that this work has significant potential to enhance LLM accessibility for coding by reducing computational requirements to enable local execution on consumer-grade hardware, and supporting faster inference times critical for real-time development feedback.

Sat 3 May

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
Paper Session 4 / Virtual Talk / Award Session & ClosingLLM4Code at 214
Chair(s): Lingming Zhang University of Illinois at Urbana-Champaign
16:00
10m
Talk
Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets
LLM4Code
Mahmoud Jahanshahi University of Tennessee, Audris Mockus University of Tennessee
Pre-print
16:10
10m
Talk
Understanding Code Properties: Is Code All You Need?
LLM4Code
Srivishnu Pyda University of Maryland, Daniel Nichols University of Maryland, Abhinav Bhatele University of Maryland
16:20
10m
Talk
Analysis of Student-LLM Interaction in a Software Engineering Project
LLM4Code
Agrawal Naman National University of Singapore, Ridwan Salihin Shariffdeen National University of Singapore, Wang Guanlin National University of Singapore, Sanka Rasnayaka National University of Singapore, Ganesh Neelakanta Iyer National University of Singapore
16:30
10m
Talk
Training LLMs for Generating IEC 61131-3 Structured Text with Online Feedback
LLM4Code
Aaron Haag Siemens AG, Bertram Fuchs Siemens AG, Altay Kacan Siemens AG, Oliver Lohse Siemens AG
16:40
10m
Talk
Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning (Virtual Talk)
LLM4Code
Laura Puccioni Spotify, Alireza Farshin NVIDIA, Mariano Scazzariello RISE Research Institutes of Sweden, Changjie Wang KTH Royal Institute of Technology, Marco Chiesa KTH Royal Institute of Technology, Dejan Kostic KTH Royal Institute of Technology
Media Attached
16:40
10m
Talk
Is More or Less Automation Better? An Investigation into the LLM4TDD Process (Virtual Talk)
LLM4Code
Sanyogita Piya The University of Texas at Arlington, Anahita Samadi The University of Texas at Arlington, Allison Sullivan University of Texas at Arlington
16:40
10m
Talk
Knowledge Graph Based Repository-Level Code Generation (Virtual Talk)
LLM4Code
Mihir Athale Northeastern University, Vishal Vaddina Quantiphi Inc.
Media Attached
16:40
10m
Talk
Leveraging LLMs for Legacy Code Modernization: Evaluation of LLM-Generated Documentation (Virtual Talk)
LLM4Code
Colin Diggs MITRE Corporation, Michael Doyle MITRE Corporation, Amit Madan MITRE Corporation, Emily Escamilla MITRE Corporation, Siggy Scott MITRE Corporation, Jacob Zimmer MITRE Corporation, Naveed Nekoo MITRE Corporation, Paul Ursino MITRE Corporation, Michael Bartholf MITRE Corporation, Zachary Robin MITRE Corporation, Anand Patel MITRE Corporation, Chris Glasz MITRE Corporation, William Macke MITRE Corporation, Paul Kirk MITRE Corporation, Jasper Phillips MITRE Corporation, Arun Sridharan MITRE Corporation, Doug Wendt MITRE Corporation, Scott Rosen MITRE Corporation, Nitin Naik MITRE Corporation, Justin F. Brunelle MITRE Corporation, Samruddhi Thaker MITRE Corporation
Media Attached
16:40
10m
Talk
From Theory to Practice: Code Generation Using LLMs for CAPEC and CWE Frameworks (Virtual Talk)
LLM4Code
Mohammed Murtuza Shahzad Syed Northern Illinois University, Joseph Wilson Northern Illinois University, Ibrahim Al Azher Northern Illinois University, Hamed Alhoori Dept. of Computer Science at the Northern Illinois University, Mona Rahimi Dept. of Computer Science at the Northern Illinois University
Media Attached
16:40
10m
Talk
Hierarchical Repository-Level Code Summarization for Business Applications Using Local LLMs (Virtual Talk)
LLM4Code
Nilesh Dhulshette TCS Research, Sapan Shah TCS Research, Vinay Kulkarni Tata Consultancy Services Research
Media Attached
16:40
10m
Talk
Code Summarization Beyond Function Level (Virtual Talk)
LLM4Code
Vladimir Makharev Innopolis University, AIRI, Vladimir Ivanov Innopolis University
Media Attached
16:40
10m
Talk
YABLoCo: Yet Another Benchmark for Long Context Code Generation (Virtual Talk)
LLM4Code
Aidar Valeev Innopolis University, Vladimir Ivanov Innopolis University, Roman Garaev Innopolis University, Vadim Lomshakov JetBrains, Irina Pionkovskaya Huawei Noah's Ark Lab, Israel Adewuyi Innopolis University
16:40
10m
Talk
CoCoNUT: Structural Code Understanding does not fall out of a tree (Virtual Talk)
LLM4Code
Claas Beger Cornell University, Saikat Dutta Cornell University
Pre-print Media Attached
16:40
10m
Talk
Do Code LLMs Understand Design Patterns? (Virtual Talk)
LLM4Code
Zhenyu Pan Northwestern University, Xuefeng Song Northwestern University, Yunkun Wang Zhejiang University, Rongyu Cao Tongyi Lab, Alibaba, China, Binhua Li Tongyi Lab, Alibaba, China, Yongbin Li Tongyi Lab, Alibaba, China, Han Liu Northwestern University
Media Attached
16:40
10m
Talk
From Scientific Texts to Verifiable Code: Automating the Process with Transformers (Virtual Talk)
LLM4Code
Changjie Wang KTH Royal Institute of Technology, Mariano Scazzariello RISE Research Institutes of Sweden, Marco Chiesa KTH Royal Institute of Technology
Media Attached
16:50
10m
Day closing
Award Session & Closing
LLM4Code