Exploring LLMs for Stakeholder-Specific Insight Generation from Software Contracts (ESEIW 2025 - ESEM - Industry, Government, and Community Track )

Who

Jyoti Shukla, Aditya Kahol, Mohit Chaudhary, Preethu Rose Anish

Track

ESEIW 2025 ESEM - Industry, Government, and Community Track

Time Zone

The program is currently displayed in (GMT-10:00) Hawaii.

Use conference time zone: (GMT-10:00) HawaiiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 2 Oct 2025 14:05 - 14:20 at Kaiulani I - LLMs for Classification, Detection, and Recommendations Chair(s): Fabio Calefato

Abstract

Context: Software contracts are legally binding agreements that outline the terms and conditions governing the development, licensing, use, or distribution of software and related services. A clear understanding of these terms and conditions ensures compliance between parties, aligns expectations, and helps developers navigate scope, timelines, and responsibilities, all of which are crucial for maintaining the quality of the software being developed. However, the intricate nature and length of contractual clauses often impede comprehension, thereby reducing their readability. While clause summarization and/or simplification may seem like a viable solution, a single contractual clause often outlines action items for multiple stakeholders across various departments within an organization. As a result, a generic summary may not adequately capture the specific responsibilities pertinent to each stakeholder mentioned in a contractual clause. Given the proven effectiveness of large language models (LLMs) in processing and analyzing complex text, this study explores their potential in generating stakeholder-specific insights from contractual clauses. Aim: Building upon the context, we empirically evaluate various state-of-the-art LLMs to determine their effectiveness in generating stakeholder-specific insights from complex contractual clauses. We also examine the efficacy of supplying contextual information to generate these insights, as opposed to relying on generic summary generation. To the best of our knowledge, no prior research in the software engineering paradigm has explored context-aware summarization that produces stakeholder-specific insights from complex text such as software contracts. Method: We investigate the application of zero-shot, few-shot prompting and finetuning techniques across various state-of-theart LLMs comparing the best results obtained from each approach across several models including T5, Llama 3.1 & 3.2, PEGASUS, BART, Mistral, Gemma 3 and Qwen 2.5. Enterprise-hosted models such as OpenAl’s ChatGPT and Anthropic’s Claude fall outside the scope of this study, as our focus is on deployable models that can be integrated with proprietary datasets complying with the data privacy policies of the involved parties and/or organizations. Accordingly, we restrict our evaluation to open-source models that offer performance comparable to commercial alternatives. We conducted our experiments on a proprietary contractual dataset comprising 4000 clauses. We validated the generated results using quantitative metrics such as ROUGE, METEOR, and BLEU scores and through human-evaluation metrics such as fluency, coherence, informativeness, and relevance to ensure the quality of the generated insights. Results and Conclusions: Based on both quantitative and qualitative metric scores, we identified finetuning as the most reliable and effective technique for generating stakeholder-specific insights, achieving improvements in the range of 150-200% over other techniques. Among the evaluated models, the finetuned Llama 3.2 model emerged as the most optimal one, as it outperformed the other models by (a) gaining a score of over 0.9 on the quantitative scale, (b) consistently being rated ‘High’ on the quality index for all four qualitative metrics, and (c) being among the fastest to generate insights (in less than one second on an average). To demonstrate the practical applicability of our approach, we integrated the finetuned Llama 3.2 model into the Software Contracts Governance System (SCGS) of a major IT vendor organization.

Jyoti Shukla

TCS Research

India

Aditya Kahol

TCS Research

India

Mohit Chaudhary

TCS Research

India

Preethu Rose Anish

TCS Research

India

Time Zone

The program is currently displayed in (GMT-10:00) Hawaii.

Use conference time zone: (GMT-10:00) HawaiiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 2 Oct
Displayed time zone: Hawaii change

13:50 - 14:50	LLMs for Classification, Detection, and RecommendationsESEM - Industry, Government, and Community Track / ESEM - Technical Track / ESEM - Emerging Results and Vision Track / ESEM - Journal First Track / at Kaiulani I Chair(s): Fabio Calefato University of Bari

13:50 15m Talk		Contribution History as a Key Feature in OSS Task Recommendation: an LLM-Based Empirical Study ESEM - Emerging Results and Vision Track Md Abdul Hannan Colorado State University, Mohammad Habibullah Rakib Colorado State University, Khondaker Masfiq Reza Colorado State University, Fabio Marcos De Abreu Santos Colorado State University, USA
14:05 15m Talk		Exploring LLMs for Stakeholder-Specific Insight Generation from Software Contracts ESEM - Industry, Government, and Community Track Jyoti Shukla TCS Research, Aditya Kahol TCS Research, Mohit Chaudhary TCS Research, Preethu Rose Anish TCS Research
14:20 15m Talk		Benchmarking large language models for automated labeling: The case of issue report classification ESEM - Journal First Track Giuseppe Colavito University of Bari, Italy, Filippo Lanubile University of Bari, Nicole Novielli University of Bari Link to publication
14:35 15m Talk		Secret Breach Detection in Source Code with Large Language Models ESEM - Technical Track Md Nafiu Rahman Bangladesh University of Engineering and Techonology, Sadif Ahmed Bangladesh University of Engineering and Techonology, Zahin Wahab The University of British Columbia, S. M. Sohan Google Inc, Rifat Shahriyar Bangladesh University of Engineering and Technology Dhaka, Bangladesh Pre-print