NICE: Non-Functional Requirements Identification, Classification, and Explanation Using Small Language Models (ICSE 2025 - Software Engineering in Practice (SEIP))

Who

Gokul Rejithkumar, Preethu Rose Anish

Track

ICSE 2025 SE In Practice (SEIP)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 1 May 2025 12:00 - 12:15 at 213 - AI for Requirements Chair(s): Jennifer Horkoff

Abstract

Accurate identification and classification of Non-Functional Requirements (NFRs) is essential for informed architectural decision-making and maintaining software quality. Numerous language model-based techniques have been proposed for NFR identification and classification. However, understanding the reasoning behind the classification outputs of these techniques remains challenging. Rationales for the classification outputs of language models enhance comprehension, aid in debugging the models, and build confidence in the classification outputs. In this paper, we present NICE, a tool for NFR Identification, Classification, and Explanation. Using an industrial requirements dataset, we generated explanations in natural language using the GPT-4o large language model (LLM). We then fine-tuned small language models (SLMs), including T5, Llama 3.1, and Phi 3, with these LLM-generated explanations to identify and classify NFRs and to explain their classification outputs. We evaluated NICE using standard evaluation metrics such as F1-score and human evaluation to assess the quality of the generated explanations. Among the models tested, T5 produced explanations of quality comparable to Llama 3.1 and Phi 3 but achieved the highest average F1-score of 0.90 in multi-label NFR classification on the industrial requirements dataset. Furthermore, to evaluate the effectiveness of NICE, a survey was conducted with 20 requirements analysts and software developers. NICE is currently deployed as a part of the Knowledge-assisted Requirements Evolution (K-RE) framework developed by a large IT vendor organization.

Link to Preprint

https://zenodo.org/records/14709254

Gokul Rejithkumar

TCS Research

India

Preethu Rose Anish