MetaSim: A search engine for finding Similar GitHub RepositoriesTool Demo Paper
How can we find other repositories on GitHub that are functionally similar to a specific repository? While GitHub offers keyword-based search functionality, there is a lack of a tool that can perform query by example to search and compare functionally similar repositories. To address this challenge, we present MetaSim: a search engine that finds similar GitHub repositories based on repository metadata features. MetaSim employs a customized technique to represent repository metadata in the embedding space for efficient indexing and searching. We construct a curated dataset of 267.6K public GitHub repositories to support our search engine. We evaluate our tool through a manual assessment on a set of 202 query by example repository and their corresponding matching pairs. Experiment results demonstrate that Readme alone can achieve high similarity precision (90.1%), which we define later. In contrast, the combined usage of Description, Topics, and Readme yields the best overall performance with similarity precision of 97.8%. To foster both research and practical applications, we open source our research artifacts through the MetaSim platform at https://metasim-app.github.io. The demonstration video of MetaSim is available at https://youtu.be/HnFnN3JclQw.
Thu 10 OctDisplayed time zone: Arizona change
15:30 - 17:00 | Session 11: Mining Software RepositoriesTool Demo Track / Research Track / Registered Reports Track / New Ideas and Emerging Results Track at Fremont Chair(s): Gregorio Robles Universidad Rey Juan Carlos | ||
15:30 15m | “What Happened to my Models?” History-Aware Co-Existence and Co-Evolution of Metamodels and ModelsResearch Track Paper Research Track Marcel Homolka Institute for Software Systems Engineering, Johannes Kepler University, Linz, Luciano Marchezan Johannes Kepler University Linz, Wesley Assunção North Carolina State University, Alexander Egyed Johannes Kepler University Linz | ||
15:45 10m | MetaSim: A search engine for finding Similar GitHub RepositoriesTool Demo Paper Tool Demo Track Md Rayhanul Masud University of California, Riverside, Md Omar Faruk Rokon Sponsored Search, Walmart Global Tech, Qian Zhang University of California at Riverside, Michalis Faloutsos UCR Media Attached | ||
15:55 10m | SEART Data Hub: Streamlining Large-Scale Source Code Mining and Pre-ProcessingTool Demo Paper Tool Demo Track Ozren Dabic Software Institute, Università della Svizzera italiana (USI), Switzerland, Rosalia Tufano Università della Svizzera Italiana, Gabriele Bavota Software Institute @ Università della Svizzera Italiana | ||
16:05 10m | Diving into Software Evolution: Virtual Reality vs. On-ScreenRegistered Reports Paper Registered Reports Track David Moreno-Lumbreras Universidad Rey Juan Carlos, Jesus M. Gonzalez-Barahona Universidad Rey Juan Carlos, Gregorio Robles Universidad Rey Juan Carlos DOI Pre-print | ||
16:15 10m | Review-Pulse: A Dashboard for Managing User Feedback for Android ApplicationsTool Demo Paper Tool Demo Track Omar Adbealziz University of Saskatchewan, Zadia Codabux University of Saskatchewan, Kevin Schneider University of Saskatchewan | ||
16:25 10m | Monitoring Temporal Dynamics of Issues in Crowdsourced User Reviews and their Impact on Mobile App UpdatesNIER Paper New Ideas and Emerging Results Track Vitor Mesaque Alves de Lima Federal University of Mato Grosso do Sul, Jacson Rodrigues Barbosa Institute of Informatics (INF) / Federal University of Goiás (UFG), Ricardo Marcondes Marcacini University of São Paulo | ||
16:35 10m | Using Animations to Understand CommitsNIER Paper New Ideas and Emerging Results Track DOI Pre-print | ||
16:45 10m | Maven Unzipped: Packaging Impacts the EcosystemResearch Track Paper Research Track Mehdi Keshani Delft University of Technology, Gideon Bot Delft University of Technology, Priyam Rungta , Maliheh Izadi Delft University of Technology, Arie van Deursen Delft University of Technology, Sebastian Proksch Delft University of Technology |