Contribution History as a Key Feature in OSS Task Recommendation: an LLM-Based Empirical Study (ESEIW 2025 - ESEM - Emerging Results and Vision Track )

Who

Md Abdul Hannan, Mohammad Habibullah Rakib, Khondaker Masfiq Reza, Fabio Marcos De Abreu Santos

Track

ESEIW 2025 ESEM - Emerging Results and Vision Track

Time Zone

The program is currently displayed in (GMT-10:00) Hawaii.

Use conference time zone: (GMT-10:00) HawaiiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 2 Oct 2025 13:50 - 14:05 at Kaiulani I - LLMs for Classification, Detection, and Recommendations Chair(s): Fabio Calefato

Abstract

Open-source software (OSS) projects often struggle to efficiently assign issues to contributors whose skills align with task requirements. Without targeted recommendations, contributors may overlook suitable issues, leading to delayed resolutions and reduced engagement. Seeking to mitigate this barrier, previous studies proposed tagging of issues with categories of libraries as a proxy for the skills sufficient to solve them. In a different direction, researchers also proposed identifying skills from developers in open-source projects in an attempt to support managers in performing the allocation. Notwithstanding the advances, if contributors are overconfident, they still might pick an issue to solve beyond their abilities. Similarly, managers may face confirmation bias and allocate an issue incompatible with the contributor’s skill set. In addition, studies reported that maintainers have little time available to support new contributors and want contributors to have autonomy and decide about contributions. To address this, we present a fully automated, AI-powered issue recommendation system that integrates past contribution history with skill-based matching. We mine public GitHub repositories to extract contributor skills using commit histories and issue resolutions, and infer issue requirements using both traditional techniques and Large Language Models (LLMs). We evaluate three matching strategies—TF-IDF, sentence-BERT (s-BERT), and LLM-based approaches—and find that the simple TF-IDF model outperforms more complex methods, achieving a top-15 accuracy of 70%. We also explore the use of a canonical skill superset for standardizing skill representations. Our findings show that historical contribution data is a significant feature for OSS issue assignment and that lightweight lexical methods remain highly effective in specific tasks. Therefore, integrating it with other features might improve performance. This work contributes a scalable framework for personalized issue recommendation that supports diverse OSS environments and enhances contributor-task alignment.

Md Abdul Hannan

Colorado State University

United States

Mohammad Habibullah Rakib

Colorado State University

United States

Khondaker Masfiq Reza

Colorado State University

United States

Fabio Marcos De Abreu Santos

Colorado State University, USA

United States

Time Zone

The program is currently displayed in (GMT-10:00) Hawaii.

Use conference time zone: (GMT-10:00) HawaiiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 2 Oct
Displayed time zone: Hawaii change

13:50 - 14:50	LLMs for Classification, Detection, and RecommendationsESEM - Industry, Government, and Community Track / ESEM - Technical Track / ESEM - Emerging Results and Vision Track / ESEM - Journal First Track / at Kaiulani I Chair(s): Fabio Calefato University of Bari

13:50 15m Talk		Contribution History as a Key Feature in OSS Task Recommendation: an LLM-Based Empirical Study ESEM - Emerging Results and Vision Track Md Abdul Hannan Colorado State University, Mohammad Habibullah Rakib Colorado State University, Khondaker Masfiq Reza Colorado State University, Fabio Marcos De Abreu Santos Colorado State University, USA
14:05 15m Talk		Exploring LLMs for Stakeholder-Specific Insight Generation from Software Contracts ESEM - Industry, Government, and Community Track Jyoti Shukla TCS Research, Aditya Kahol TCS Research, Mohit Chaudhary TCS Research, Preethu Rose Anish TCS Research
14:20 15m Talk		Benchmarking large language models for automated labeling: The case of issue report classification ESEM - Journal First Track Giuseppe Colavito University of Bari, Italy, Filippo Lanubile University of Bari, Nicole Novielli University of Bari Link to publication
14:35 15m Talk		Secret Breach Detection in Source Code with Large Language Models ESEM - Technical Track Md Nafiu Rahman Bangladesh University of Engineering and Techonology, Sadif Ahmed Bangladesh University of Engineering and Techonology, Zahin Wahab The University of British Columbia, S. M. Sohan Google Inc, Rifat Shahriyar Bangladesh University of Engineering and Technology Dhaka, Bangladesh Pre-print