Bake Two Cakes with One Oven: RL for Defusing Popularity Bias and Cold-start in Third-Party Library Recommendations
Third-party libraries (TPLs) have become an integral part of modern software development, enhancing developer productivity and accelerating time-to-market. However, identifying suitable candidates from a rapidly growing and continuously evolving collection of TPLs remains a challenging task. TPL recommender systems have been studied, offering a promising solution to address this issue. They typically rely on collaborative filtering (CF) that exploits a two-dimensional project-library matrix (user-item in general context of recommendation) when making recommendations. We have noticed that CF-based approaches often encounter two challenges: (i) a tendency to recommend popular items more frequently, making them even more dominant, a phenomenon known as popularity bias, and (ii) difficulty in generating recommendations for new users or items due to limited user-item interactions, commonly referred to as the cold-start problem. In this paper, we propose a reinforcement learning (RL)-based approach to address popularity bias and the cold-start problem in TPL recommendation. Our method comprises three key components. First, we utilize a graph convolution network (GCN)-based embedding model to learn user preferences and user-item interactions, allowing us to capture complex relationships within interaction subgraphs and effectively represent new user/item embeddings. Second, we introduce an aggregation operator to generate a representative embedding from user and item embeddings, which is then used to model cold-start users. Finally, we adopt a model-based RL framework for TPL recommendation, where popularity bias is mitigated through a carefully designed reward function and a rarity-based replay buffer partitioning strategy. The results demonstrated that our proposed approach outperforms state-of-the-art models in cold-start scenarios while effectively mitigating the impact of popularity bias.
Fri 20 JunDisplayed time zone: Athens change
13:30 - 15:00 | APIResearch Papers / Short Papers, Emerging Results / AI Models / Data at Senate Hall Chair(s): Vesna Nowack Imperial College London | ||
13:30 15mTalk | Version-level Third-Party Library Detection in Android Applications via Class Structural Similarity Research Papers Bolin Zhou Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Jingzheng Wu Institute of Software, Chinese Academy of Sciences, Xiang Ling Institute of Software, Chinese Academy of Sciences, Tianyue Luo Institute of Software, Chinese Academy of Sciences, Jingkun Zhang Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences Pre-print | ||
13:45 10mShort-paper | Analyzing the Usage of Donation Platforms for PyPI Libraries Short Papers, Emerging Results Link to publication Pre-print | ||
13:55 10mTalk | Bake Two Cakes with One Oven: RL for Defusing Popularity Bias and Cold-start in Third-Party Library Recommendations Short Papers, Emerging Results Hoang Minh Vuong Hanoi University of Science and Technology, Anh M. T. Bui Hanoi University of Science and Technology, Phuong T. Nguyen University of L’Aquila, Davide Di Ruscio University of L'Aquila Pre-print | ||
14:05 10mTalk | Identifying Critical Dependencies in Large-Scale Continuous Software Engineering Short Papers, Emerging Results Pre-print | ||
14:15 15mTalk | Large Language Models for API Classification: An Explorative Study AI Models / Data Gabriel Morais UQAR, Edwin Lemelin Université du Québec à Rimouski (UQAR) - Université Laval, Mehdi Adda Université du Québec à Rimouski (UQAR), Dominik Bork TU Wien, Vienna, Austria Pre-print | ||
14:30 15mTalk | Understanding API Usage and Testing: An Empirical Study of C Libraries Research Papers |