Semantically-enhanced Topic Recommendation Systems for Software Projects
Software-related platforms such as GitHub and Stack Overflow, have enabled their users to collaboratively label software entities with a form of metadata called topics. Tagging software repositories with relevant topics can be exploited for facilitating various downstream tasks. For instance, a correct and complete set of topics assigned to a repository can increase its visibility. Consequently, this improves the outcome of tasks such as browsing, searching, navigation, and organization of repositories. Unfortunately, assigned topics are usually highly noisy, and some repositories do not have well-assigned topics. Thus, there have been efforts on recommending topics for software projects, however, the semantic relationships among these topics have not been exploited so far. In this work, we propose two recommender models for tagging software projects that incorporate the semantic relationship among topics. Our approach has two main phases; (1) we first take a collaborative approach to curate a dataset of quality topics specifically for the domain of software engineering and development. We also enrich this data with the semantic relationships among these topics and encapsulate them in a knowledge graph we call SED-KGraph. Then, (2) we build two recommender systems; The first one operates only based on the list of original topics assigned to a repository and the relationships specified in our knowledge graph. The second predictive model, however, assumes there are no topics available for a repository, hence it proceeds to predict the relevant topics based on both textual information of a software project (such as its README file), and SED-KGraph. We built SED-KGraph in a crowd-sourced project with 170 contributors from both academia and industry. Through their contributions, we constructed SED-KGraph with 2234 carefully evaluated relationships among 863 community-curated topics. Regarding the recommenders’ performance, the experiment results indicate that our solutions outperform baselines that neglect the semantic relationships among topics by at least 25% and 23% in terms of Average Success Rate and Mean Average Precision metrics, respectively. We share SED-KGraph, as a rich form of knowledge for the community to re-use and build upon. We also release the source code of our two recommender models, KGRec and KGRec+.
Thu 18 MayDisplayed time zone: Hobart change
13:45 - 15:15 | Recommender systemsDEMO - Demonstrations / Technical Track / SEIP - Software Engineering in Practice / Journal-First Papers at Level G - Plenary Room 1 Chair(s): Kevin Moran George Mason University | ||
13:45 15mTalk | Autonomy Is An Acquired Taste: Exploring Developer Preferences for GitHub Bots Technical Track Amir Ghorbani University of Victoria, Nathan Cassee Eindhoven University of Technology, Derek Robinson University of Victoria, Adam Alami Aalborg University, Neil Ernst University of Victoria, Alexander Serebrenik Eindhoven University of Technology, Andrzej WÄ…sowski IT University of Copenhagen, Denmark Pre-print | ||
14:00 15mTalk | Flexible and Optimal Dependency Management via Max-SMT Technical Track Donald Pinckney Northeastern University, Federico Cassano Northeastern University, Arjun Guha Northeastern University and Roblox Research, Jonathan Bell Northeastern University, Massimiliano Culpo np-complete, S.r.l., Todd Gamblin Lawrence Livermore National Laboratory Pre-print | ||
14:15 15mTalk | Towards More Effective AI-assisted Programming: A Systematic Design Exploration to Improve Visual Studio IntelliCode's User Experience SEIP - Software Engineering in Practice Priyan Vaithilingam Harvard University, Elena Glassman Harvard University, Peter Groenwegen , Sumit Gulwani Microsoft, Austin Z. Henley Microsoft, Rohan Malpani , David Pugh , Arjun Radhakrishna Microsoft, Gustavo Soares Microsoft, Joey Wang , Aaron Yim | ||
14:30 7mTalk | DeepLog: Deep-Learning-Based Log Recommendation DEMO - Demonstrations Yang Zhang Hebei University of Science and Technology, Xiaosong Chang Hebei University of Science and Technology, Lining Fang Hebei University of Science and Technology, Yifan Lu Hebei University of Science and Technology | ||
14:37 7mTalk | ShellFusion: An Answer Generator for Shell Programming Tasks via Knowledge Fusion DEMO - Demonstrations Zhongqi Chen School of Software Engineering, Sun Yat-sen University, Neng Zhang School of Software Engineering, Sun Yat-sen University, Pengyue Si School of Software Engineering, Sun Yat-sen University, ChenQinde School of Software Engineering, Sun Yat-sen University, Chao Liu Chongqing University, Zibin Zheng School of Software Engineering, Sun Yat-sen University | ||
14:45 7mTalk | Revisiting, Benchmarking and Exploring API Recommendation: How Far are We? Journal-First Papers Yun Peng Chinese University of Hong Kong, Shuqing Li The Chinese University of Hong Kong, Wenwei Gu The Chinese University of Hong Kong, Yichen LI The Chinese University of Hong Kong, Wenxuan Wang The Chinese University of Hong Kong, Cuiyun Gao Harbin Institute of Technology, Michael Lyu The Chinese University of Hong Kong | ||
14:52 7mTalk | Semantically-enhanced Topic Recommendation Systems for Software Projects Journal-First Papers Maliheh Izadi Delft University of Technology, Mahtab Nejati University of Waterloo, Abbas Heydarnoori Bowling Green State University | ||
15:00 7mTalk | Code Librarian: A Software Package Recommendation System SEIP - Software Engineering in Practice Lili Tao JP Morgan Chase & Co, Alexandru-Petre Cazan JP Morgan Chase & Co, Senad Ibraimoski JP Morgan Chase & Co, Sean Moran JP Morgan Chase & Co |