EASE 2024
Tue 18 - Fri 21 June 2024 Salerno, Italy

Software repositories have a plethora of information about software development, encompassing details such as code contributions, bug reports, code reviews, and project documentation. This rich source of data can be harnessed to enhance not only software quality and development velocity but also to gain insights into team collaboration, identify potential bottlenecks, and inform strategic decision-making throughout the software development lifecycle. Previous studies show that many stakeholders cannot benefit from the project information due to the technical knowledge and expertise required to extract the project data.

To lower the barrier to entry by automating the process of extracting and analyzing repository data, we explored the potential of using a large-language model (LLM) to develop a chatbot for answering questions related to software repositories. We evaluated the chatbot on a set of 150 software repository-related questions. We found that the chatbot correctly answered one question about the repository. This result prompted us to shift our focus to investigate the challenges in adopting LLMs for the out-of-the-box development of software repository chatbots. We identified five main challenges related to retrieving data, structuring the data, and generating the answer to the user’s query. Among these challenges, the inaccurate retrieval of data to answer questions is the most frequent, occurring in 83.3% of the queries. In this paper, we share our experience and challenges in developing an LLM-based chatbot to answer software repository-related questions within the SE community. We also provide recommendations on mitigating these challenges. Our findings will serve as a foundation to drive future research aimed at enhancing LLMs for adoption in extracting useful information from software repositories, fostering advancements in natural language understanding, data retrieval, and response generation within the context of software repository-related questions and analytics.