OSSFP: Precise and Scalable C/C++ Third-Party Library Detection using Fingerprinting Functions
Third-party libraries (TPLs) are frequently used in software to boost efficiency by avoiding repeated developments. However, the massive using TPLs also brings security threats since TPLs may introduce bugs and vulnerabilities. Therefore, software composition analysis (SCA) tools have been proposed to detect and manage TPL usage. Unfortunately, due to the presence of common and trivial functions in the bloated feature dataset, existing tools fail to precisely and rapidly identify TPLs in C/C++ real-world projects. To this end, we propose OSSFP, a novel SCA framework for effective and efficient TPL detection in large-scale real-world projects via generating unique fingerprints for open source software. By removing common and trivial functions and keeping only the core functions to build the fingerprint index for each TPL project, OSSFP significantly reduces the database size and accelerates the detection process. It also improves TPL detection accuracy since noises are excluded from the fingerprints. We applied OSSFP on a large data set containing 23,427 C/C++ repositories, which included 585,683 versions and 90 billion lines of code. The result showed that it could achieve 90.84% of recall and 90.34% of precision, which outperformed the state-of-the-art tool by 35.31% and 3.71%, respectively. OSSFP took only 0.12 seconds on average to identify all TPLs per project, which was 22 times faster than the other tool. OSSFP has proven to be highly scalable on large-scale datasets.
Wed 17 MayDisplayed time zone: Hobart change
11:00 - 12:30 | APIs and librariesTechnical Track / Journal-First Papers / SEIP - Software Engineering in Practice at Meeting Room 105 Chair(s): Sarah Nadi University of Alberta | ||
11:00 15mTalk | UpCy: Safely Updating Outdated Dependencies Technical Track Andreas Dann Paderborn University, Ben Hermann TU Dortmund, Eric Bodden Heinz Nixdorf Institut, Paderborn University and Fraunhofer IEM Pre-print | ||
11:15 15mTalk | APICAD: Augmenting API Misuse Detection Through Specifications From Code And Documents Technical Track DOI Pre-print | ||
11:30 15mTalk | Compatibility Issue Detection for Android Apps Based on Path-Sensitive Semantic Analysis Technical Track Sen Yang Army Engineering University of PLA, Sen Chen Tianjin University, Lingling Fan Nankai University, Sihan Xu Nankai University, China, Zhanwei Hui Academy of Military Science, Song Huang Army Engineering University of PLA | ||
11:45 15mTalk | OSSFP: Precise and Scalable C/C++ Third-Party Library Detection using Fingerprinting Functions Technical Track Wu Jiahui Nanyang Technological University, Zhengzi Xu Nanyang Technological University, Wei Tang Tsinghua University, Lyuye Zhang Nanyang Technological University, Yueming Wu Nanyang Technological University, Chengyue Liu Scantist, Kairan Sun Singapore University of Technology and Design, Lida Zhao Nanyang Technological University, Yang Liu Nanyang Technological University | ||
12:00 15mTalk | Scaling Web API Integrations SEIP - Software Engineering in Practice Pre-print | ||
12:15 7mTalk | Giving Back: Contributions Congruent to Library Dependency Changes in a Software Ecosystem Journal-First Papers Supatsara Wattanakriengkrai Nara Institute of Science and Technology, Dong Wang Kyushu University, Japan, Raula Gaikovina Kula Nara Institute of Science and Technology, Christoph Treude University of Melbourne, Patanamon Thongtanunam University of Melbourne, Takashi Ishio Future University Hakodate, Kenichi Matsumoto Nara Institute of Science and Technology Link to publication | ||
12:22 7mTalk | Breaking Bad? Semantic Versioning and Impact of Breaking Changes in Maven Central Journal-First Papers Lina Ochoa Eindhoven University of Technology, Thomas Degueule CNRS, LaBRI, Jean-Rémy Falleri Bordeaux INP, Jurgen Vinju CWI; Eindhoven University of Technology |