LLM-Based Chatbots for Mining Software Repositories: Challenges and Opportunities (EASE 2024 - Research Papers)

Who

Samuel Abedu, Ahmad Abdellatif, Emad Shihab

Track

EASE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 20 Jun 2024 12:00 - 12:15 at Room Vietri - Mining Software Repositories Chair(s): Giuseppe Destefanis

Abstract

Software repositories have a plethora of information about software development, encompassing details such as code contributions, bug reports and code reviews. This rich source of data can be harnessed to enhance not only software quality and development velocity but also to gain insights into team collaboration and inform strategic decision-making throughout the software development lifecycle. Previous studies show that many stakeholders cannot benefit from the project information due to the technical knowledge and expertise required to extract the project data.

To lower the barrier to entry by automating the process of extracting and analyzing repository data, we explored the potential of using an LLM to develop a chatbot for answering questions related to software repositories. We evaluated the chatbot on 150 software repository-related questions. We found that the chatbot correctly answered one question. This result prompted us to shift our focus to investigate the challenges in adopting LLMs for the out-of-the-box development of software repository chatbots. We identified five main challenges related to retrieving data, structuring the data, and generating the answer to the user’s query. Among these challenges, the most frequent (83.3%) is the inaccurate retrieval of data to answer questions. In this paper, we share our experience and challenges in developing an LLM-based chatbot to answer software repository-related questions within the SE community. We also provide recommendations on mitigating these challenges. Our findings will serve as a foundation to drive future research aimed at enhancing LLMs for adoption in extracting useful information from software repositories, fostering advancements in natural language understanding, data retrieval, and response generation within the context of software repository-related questions and analytics.

Link to Preprint

https://das.encs.concordia.ca/pdf/abedu2024llm.pdf

Samuel Abedu

Concordia University

Canada

Ahmad Abdellatif

University of Calgary

Canada

Emad Shihab

Concordia University

Canada

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 20 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

11:00 - 12:30	Mining Software RepositoriesResearch Papers / Journal-first at Room Vietri Chair(s): Giuseppe Destefanis Brunel University London

11:00 15m Talk		On the Accuracy of GitHub's Dependency Graph Research Papers Daniele Bifolco University of Sannio, Sabato Nocera University of Salerno, Simone Romano University of Salerno, Massimiliano Di Penta University of Sannio, Italy, Rita Francese University of Salerno, Giuseppe Scanniello University of Salerno
11:15 15m Talk		Towards Semi-Automated Merge Conflict Resolution: Is It Easier Than We Expected?Distinguished Paper Award Research Papers Alexander Boll University of Bern, Yael van Dok University of Bern, Manuel Ohrndorf University of Bern, Alexander Schultheiß Paderborn University, Timo Kehrer University of Bern
11:30 15m Talk		Leveraging Statistical Machine Translation for Code Search Research Papers Hung Phan , Ali Jannesari Iowa State University
11:45 15m Talk		LEGION: Harnessing Pre-trained Language Models for GitHub Topic Recommendations with Distribution-Balance Loss Research Papers Yen-Trang Dang Hanoi University of Science and Technology, Le-Cong Thanh The University of Melbourne, Phuc-Thanh Nguyen Hanoi University of Science and Technology, Anh M. T. Bui Hanoi University of Science and Technology, Phuong T. Nguyen University of L’Aquila, Xuan-Bach D. Le University of Melbourne, Quyet Thang Huynh Hanoi University of Science and Technology Pre-print
12:00 15m Talk		LLM-Based Chatbots for Mining Software Repositories: Challenges and Opportunities Research Papers Samuel Abedu Concordia University, Ahmad Abdellatif University of Calgary, Emad Shihab Concordia University Pre-print
12:15 15m Talk		An exploratory study of software artifacts on GitHub from the lens of documentation Journal-first Akhila Sri Manasa Venigalla IIT Tirupati, Sridhar Chimalakonda Indian Institute of Technology, Tirupati