Evaluating Large Language Models for Requirements Question Answering in Industrial Aerospace Software
Aerospace software presents significant challenges to requirements engineering due to its design complexity and stringent safety standards. When manually drafting requirement documents, engineers need strong domain knowledge while also navigating heterogeneous data, which leads to errors and inefficiencies. This paper evaluates the capabilities of large language models (LLMs) in understanding aerospace software requirements and their potential to assist in requirements question answering (QA). We develop an aerospace requirements QA benchmark based on industrial software assets, books, and research materials, creating a total of 6,696 QA pairs across ten tasks and three heterogeneous data formats: text, tables, and formulas. We then evaluate the domain-specific performance of five mainstream open-source LLMs using zero-shot learning, few-shot learning, and retrieval-augmented generation (RAG) techniques. We further categorize hallucinations from LLMs and quantitatively analyze error distributions. Moreover, we conduct a user study to assess the LLM’s practical usefulness when applying to requirements QA. The evaluation results show that (1) LLMs demonstrate limited performance in the aerospace software domain, (2) RAG techniques significantly enhance the capabilities of LLMs for text-based tasks, while few-shot learning improves the performance of most LLMs, (3) four distinct types of QA hallucinations are identified, and (4) LLM QA is particularly beneficial for junior engineers. This research provides valuable perspectives for the future application of LLMs in aerospace software.
Mon 23 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:30 - 12:30 | RE and DesignResearch Papers / Demonstrations / Journal First / Industry Papers at Andromeda Chair(s): Ipek Ozkaya Carnegie Mellon University | ||
10:30 10mTalk | PF2UML:A Tool for Problem-Oriented Requirements Modeling and Transformation Demonstrations Hongbin Xiao Guangxi Key Lab of Multi-Source Information Mining and Securit(Guangxi Normal University), Zhi Li Guangxi Normal University, Yilong Yang Beihang University, Fei Tang Huawei Technologies Co., Ltd, Dongming Jin Peking University, China Media Attached | ||
10:40 10mTalk | DReM: Efficiently Generating Domain-Specific Requirements Modeling Tool Demonstrations Ruixin Geng Beihang University, Jiahao Weng Beihang University, Ning Ge School of Software, Beihang University, Jingyao Li Beihang University, Chunming Hu Beihang University | ||
10:50 20mTalk | Incorporating Verification Standards for Security Requirements Generation from Functional Specifications Research Papers Xiaoli Lian Beihang University, China, Shuaisong Wang Beihang University, Hanyu Zou Beihang University, Fang Liu Beihang University, Jiajun Wu Beihang University, Li Zhang Beihang University DOI | ||
11:10 10mTalk | Theano: A Tool for Verifying the Consistency and Completeness in Tabular Requirements Demonstrations Aurora Francesca Zanenga University of Bergamo, Bergamo, Italy, Nunzio Marco Bisceglia University of Bergamo, Bergamo, Italy, Benedetta Ippoliti University of Bergamo, Bergamo, Italy, Andrea Bombarda University of Bergamo, Angelo Gargantini University of Bergamo, Akshay Rajhans Mathworks, Claudio Menghi University of Bergamo; McMaster University | ||
11:20 20mTalk | Evaluating Large Language Models for Requirements Question Answering in Industrial Aerospace Software Industry Papers Longxing Yang Beijing Institute of Control Engineering, Yixing Luo Beijing Institute of Control Engineering, Hao Gao Beijing Institute of Control Engineering, Yingshuang Fan Beijing Institute of Control Engineering, Jingru Zhang Beijing Institute of Control Engineering, Xiaofeng Li Beijing Institute of Control Engineering, Xiaogang Dong Beijing Institute of Control Engineering, Bin Gu Beijing Institute of Control Engineering, Zhi Jin Peking University, Mengfei Yang China Academy of Space Technology | ||
11:40 20mTalk | To Do or Not to Do: Semantics and Patterns for Do Activities in UML PSSM State Machines Journal First Márton Elekes Budapest University of Technology and Economics, Vince Molnár Budapest University of Technology and Economics, Zoltán Micskei Budapest University of Technology and Economics Link to publication DOI Pre-print | ||
12:00 10mTalk | Merlin-A: A tool to engineer adaptive modelling languages Demonstrations Pre-print Media Attached | ||
12:10 20mTalk | Unlocking Optimal ORM Database Designs: Accelerated Tradeoff Analysis with Transformers Research Papers Md Rashedul Hasan University of Nebraska-Lincoln, Mohammad Rashedul Hasan University of Nebraska-Lincoln, Hamid Bagheri University of Nebraska-Lincoln DOI Pre-print File Attached | ||
Andromeda is located close to the restaurant and the bar, at the end of the corridor on the side of the bar.
From the registration desk, go towards the restaurant, turn left towards the bar, walk until the end of the corridor.