OSS-LCAF: Open-Source Software License Conflict Analysis Framework (ICSE 2025 - Industry Challenge Track)

Who

Aditya Kahol, Anka Chandrahas Tummepalli, Preethu Rose Anish

Track

ICSE 2025 Industry Challenge Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 1 May 2025 15:15 - 15:30 at 211 - Industry Challenge Presentations Chair(s): Federica Sarro, Xin Xia

Abstract

Conflicts arising from the presence of multiple licenses in open-source software (OSS) projects can lead to compliance issues, legal risks, operational challenges and even financial implications for developers and organizations. While enormous efforts have been made to automate license extraction and detect potential conflicts, current techniques primarily rely on static rule matching for license identification and extraction, or probabilistic and shallow neural modeling techniques for license term prediction. These techniques often struggle to adapt to evolving patterns. The advent of large language models (LLMs) presents new opportunities for comprehending complex information within license files; however, their application in license term extraction and conflict analysis remains underexplored. In this paper, we present an automated framework for license identification and conflict analysis that leverages the capabilities of LLMs. Additionally, we introduce a benchmark dataset specifically designed for the extraction of license terms. The framework consists of three key modules: (a) an automated license extraction module that identifies and extract declared, inline and referenced licenses from within local project repositories; (b) an LLM based OSS license labeling component utilizing few-shot Chain-of- thought prompting along with structured output generation; and (c) an LLM-based conflict analysis module that utilizes a hybrid approach of advanced prompting techniques. Our benchmark dataset contains over 5000 labeled instances of license texts, including approximately 2,000 well-known license texts sourced from open-source license repositories and GitHub projects. In addition to releasing the dataset, we provide a set of fine-tuned language models specifically designed for license term identification and conflict analysis. We compare our framework with existing automated license identification and conflict detection techniques, conduct an in-depth analysis of the benchmark dataset and the incorporated prompting strategies, and discuss their implications and potential directions for future research.

Aditya Kahol

TCS Research

India

Anka Chandrahas Tummepalli

TCS Research

Preethu Rose Anish

TCS Research

India

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 1 May
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	Industry Challenge PresentationsIndustry Challenge Track at 211 Chair(s): Federica Sarro University College London, Xin Xia Huawei

14:00 15m Talk		CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge GraphAward Winner Industry Challenge Track Hanxiang Xu Huazhong University of Science and Technology, Wei Ma , Ting Zhou Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Kai Chen Huazhong University of Science and Technology, Qiang Hu The University of Tokyo, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology
14:15 15m Talk		ClauseBench: Enhancing Software License Analysis with Clause-Level Benchmarking Industry Challenge Track Qiang Ke Huazhong University of Science and Technology, Xinyi Hou Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology
14:30 15m Talk		CodeMorph: Mitigating Data Leakage in Large Language Model Assessment Industry Challenge Track Hongzhou Rao Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Wenjie Zhu Huazhong University of Science and Technology, Ling Xiao Huazhong University of Science and Technology, Meizhen Wang Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology
14:45 15m Talk		CommitShield: Tracking Vulnerability Introduction and Fix in Version Control SystemsSecurity Industry Challenge Track Zhaonan Wu Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Chen Wei MYbank, Ant Group, Zirui Wan Huazhong University of Science and Technology, Yue Liu Monash University, Haoyu Wang Huazhong University of Science and Technology
15:00 15m Talk		Exploring Large Language Models for Analyzing Open Source License Conflicts: How Far Are We? Industry Challenge Track Xing Cui Institute of Software, Chinese Academy of Sciences, Jingzheng Wu Institute of Software, The Chinese Academy of Sciences, Xiang Ling Institute of Software, Chinese Academy of Sciences, Tianyue Luo Institute of Software, Chinese Academy of Sciences, Mutian Yang Beijing ZhongKeWeiLan Technology Co.,Ltd., Wenxiang Ou Institute of Software, Chinese Academy of Sciences
15:15 15m Talk		OSS-LCAF: Open-Source Software License Conflict Analysis Framework Industry Challenge Track Aditya Kahol TCS Research, Anka Chandrahas Tummepalli TCS Research, Preethu Rose Anish TCS Research