EASE 2024
Tue 18 - Fri 21 June 2024 Salerno, Italy
Thu 20 Jun 2024 12:00 - 12:15 at Room Vietri - Mining Software Repositories Chair(s): Giuseppe Destefanis

Software repositories have a plethora of information about software development, encompassing details such as code contributions, bug reports and code reviews. This rich source of data can be harnessed to enhance not only software quality and development velocity but also to gain insights into team collaboration and inform strategic decision-making throughout the software development lifecycle. Previous studies show that many stakeholders cannot benefit from the project information due to the technical knowledge and expertise required to extract the project data.

To lower the barrier to entry by automating the process of extracting and analyzing repository data, we explored the potential of using an LLM to develop a chatbot for answering questions related to software repositories. We evaluated the chatbot on 150 software repository-related questions. We found that the chatbot correctly answered one question. This result prompted us to shift our focus to investigate the challenges in adopting LLMs for the out-of-the-box development of software repository chatbots. We identified five main challenges related to retrieving data, structuring the data, and generating the answer to the user’s query. Among these challenges, the most frequent (83.3%) is the inaccurate retrieval of data to answer questions. In this paper, we share our experience and challenges in developing an LLM-based chatbot to answer software repository-related questions within the SE community. We also provide recommendations on mitigating these challenges. Our findings will serve as a foundation to drive future research aimed at enhancing LLMs for adoption in extracting useful information from software repositories, fostering advancements in natural language understanding, data retrieval, and response generation within the context of software repository-related questions and analytics.

Thu 20 Jun

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

11:00 - 12:30
Mining Software RepositoriesResearch Papers / Journal-first at Room Vietri
Chair(s): Giuseppe Destefanis Brunel University London
11:00
15m
Talk
On the Accuracy of GitHub's Dependency Graph
Research Papers
Daniele Bifolco University of Sannio, Sabato Nocera Department of Computer Science, University of Salerno, Simone Romano University of Salerno, Massimiliano Di Penta University of Sannio, Italy, Rita Francese University of Salerno, Giuseppe Scanniello University of Salerno
11:15
15m
Talk
Towards Semi-Automated Merge Conflict Resolution: Is It Easier Than We Expected?Distinguished Paper Award
Research Papers
Alexander Boll University of Bern, Yael van Dok University of Bern, Manuel Ohrndorf University of Bern, Alexander Schultheiß Paderborn University, Timo Kehrer University of Bern
11:30
15m
Talk
Leveraging Statistical Machine Translation for Code Search
Research Papers
Hung Phan , Ali Jannesari Iowa State University
11:45
15m
Talk
LEGION: Harnessing Pre-trained Language Models for GitHub Topic Recommendations with Distribution-Balance Loss
Research Papers
Yen-Trang Dang Hanoi University of Science and Technology, Le-Cong Thanh The University of Melbourne, Phuc-Thanh Nguyen Hanoi University of Science and Technology, Anh M. T. Bui Hanoi University of Science and Technology, Phuong T. Nguyen University of L’Aquila, Xuan-Bach D. Le University of Melbourne, Quyet Thang Huynh Hanoi University of Science and Technology
Pre-print
12:00
15m
Talk
LLM-Based Chatbots for Mining Software Repositories: Challenges and Opportunities
Research Papers
Samuel Abedu Concordia University, Ahmad Abdellatif University of Calgary, Emad Shihab Concordia University
Pre-print
12:15
15m
Talk
An exploratory study of software artifacts on GitHub from the lens of documentation
Journal-first
Akhila Sri Manasa Venigalla IIT Tirupati, Sridhar Chimalakonda Indian Institute of Technology, Tirupati