We Know What You're Looking For: Recommendation for Large-Scale Open Source Software
In recent years, with the rapid advancement of software engineering technologies and industry, Open Source Software (OSS) has become a mainstream model for software development and innovation. Increasingly, organizations and developers are adopting and customizing existing OSS to simplify and accelerate development processes. During OSS adoption, recommending suitable software based on user needs is crucial for enhancing development efficiency and addressing diverse requirements. However, the vast number and diversity of OSS make the recommendation task highly challenging. Despite progress in previous research, several issues remain, such as neglect of key software attributes, complexity in extracting multilingual features, and challenges of cold start and data sparsity.
This paper presents AthenaRec, a large-scale OSS recommendation system comprising three core modules: Delphi, Argus, and Hestia. AthenaRec aims to recommend relevant and suitable software from a vast OSS based on user needs. Specifically, Delphi first analyzes user queries to identify intention; Argus employs a heterogeneous ensemble recall approach to retrieve a large set of candidate software relevant to the identified intention; finally, Hestia adopts a two-stage deep ranking strategy. It performs coarse ranking by integrating multilingual modeling with contrastive learning, followed by fine ranking with a large language model, augmented by retrieval-augmented generation to incorporate external evidence. To evaluate the effectiveness of AthenaRec, we use a query dataset from real application scenarios. Experimental results demonstrate that, on the test set of 7,500 queries, AthenaRec achieves superior recommendation performance, with Hits@20, MAP@20, NDCG@20, and MRR scores of 98.27%, 95.60%, 95.05%, and 92.92%, respectively. On average, AthenaRec outperforms other top methods by 10.9% across all evaluation metrics. Additionally, we develop a Visual Studio Code (VSCode) plugin based on AthenaRec, which can be accessed via:https://marketplace.visualstudio.com/items?itemName=open-source.opensource-recommend&ssr=false#overview. We intend for this research to provide a reference for software developers, advancing the efficiency and accuracy of OSS recommendation.