TIVER: Identifying Adaptive Versions of C/C++ Third-Party Open-Source Components Using a Code Clustering Technique
This program is tentative and subject to change.
Reusing third-party open-source software (OSS) provides many benefits but can expose the entire system to risks owing to propagated vulnerabilities. While tracking the versions of OSS components can help prevent threats, existing approaches typically map a single version to a reused OSS codebase. This coarse-grained method fails to address multiple versions of code that coexist within the codebase, resulting in ineffective OSS management. Additionally, effectively identifying component versions is challenging owing to noise codes, such as algorithmic codes that coexist across different OSS, as well as duplicate components arising from the redundant reuse of OSS.
In this paper, we introduce the concept of the adaptive version, a one-stop solution to represent the version diversity of reused OSS. We present TIVER, an effective approach for identifying adaptive versions of OSS components. TIVER employs two key techniques: (1) fine-grained function-level versioning to uncover detailed versions, and (2) OSS code clustering to identify duplicate components and remove noise. This enables precise identification of OSS reuse locations and adaptive versions, effectively mitigating threats related to OSS reuse. Evaluation of popular C/C++ software on GitHub revealed that OSS components with a single version accounted for only 33%, while the remaining 67% of the components contained more than three versions on average. Nonetheless, TIVER effectively identified adaptive versions of OSS components with 88.46% precision and 91.63% recall in duplicate component distinction, and 86% precision and 86.84% recall in eliminating noise, while existing approaches barely achieved 42% recall in distinguishing duplicates and did not address noise. Further experiments showed that TIVER could enhance vulnerability management and be applied to Software Bills of Materials (SBOM) to improve supply chain security.