LLM-assisted Extraction of Regulatory Requirements: A Case Study on the GDPR
Modern software systems increasingly rely on personal data. Despite the enforcement of the European General Data Protection Regulation (GDPR) and the growing awareness about privacy and data protection, many individuals’ rights remain unsatisfactorily implemented in software systems. This is partially due to the knowledge gap between legal interpretation and software development. In this paper, we address this gap first by extracting, in close collaboration with legal experts, a list of 108 requirements pertinent to the right of access (ACC) and the right to portability (PRT), two fundamental rights under the GDPR. We further propose the XTRAREG approach, which utilizes large language models (LLMs) and retrieval augmented generation (RAG) to provide automated assistance in extracting privacy requirements from predefined legal sources. Compared to the manually extracted requirements, XTRAREG can automatically generate requirements with an accuracy of 81.8% for ACC and 56.7% for PRT. Our empirical evaluation reveals two notable observations: (i) A skewed performance in the favor of ACC, indicating the significant impact of abundant training data of the LLM, (ii) despite explicit exposure of legal references through RAG, the LLM generates requirements predominantly from the GDPR.