LLM-Based Chatbots for Mining Software Repositories: Challenges and Opportunities
Software repositories have a plethora of information about software development, encompassing details such as code contributions, bug reports and code reviews. This rich source of data can be harnessed to enhance not only software quality and development velocity but also to gain insights into team collaboration and inform strategic decision-making throughout the software development lifecycle. Previous studies show that many stakeholders cannot benefit from the project information due to the technical knowledge and expertise required to extract the project data.
To lower the barrier to entry by automating the process of extracting and analyzing repository data, we explored the potential of using an LLM to develop a chatbot for answering questions related to software repositories. We evaluated the chatbot on 150 software repository-related questions. We found that the chatbot correctly answered one question. This result prompted us to shift our focus to investigate the challenges in adopting LLMs for the out-of-the-box development of software repository chatbots. We identified five main challenges related to retrieving data, structuring the data, and generating the answer to the user’s query. Among these challenges, the most frequent (83.3%) is the inaccurate retrieval of data to answer questions. In this paper, we share our experience and challenges in developing an LLM-based chatbot to answer software repository-related questions within the SE community. We also provide recommendations on mitigating these challenges. Our findings will serve as a foundation to drive future research aimed at enhancing LLMs for adoption in extracting useful information from software repositories, fostering advancements in natural language understanding, data retrieval, and response generation within the context of software repository-related questions and analytics.