EASE 2024
Tue 18 - Fri 21 June 2024 Salerno, Italy

Software Engineering (SE) researchers are extensively applying Large Language Models (LLMs) to address challenges in software engineering tasks such as code clone detection, code summarization, program comprehension among others. Despite promising results, LLMs have to be fine-tuned and customized with specific datasets for optimal performance. However, the proprietary nature of SE data, and the lack of LLMs trained on non-open source data is an open problem. While there exists work on applying Federated Learning (FL) for SE, integration of FL with LLMs for software engineering is unexplored. Hence, in this paper, we propose a FedLLM for the task of code summarization. We setup a federated learning architecture and fine-tune LLM (Llama2 with 6.7 billion parameters) using Parameter Efficient Fine-Tuning(PEFT) for code summarization. This is achieved with a 40GB RAM GPU in an A100 architecture. Results show that FL-trained LLM is as effective as a centrally-trained one. We envision that leveraging non-open source data using FedLLM for software engineering could be an interesting research direction.