Protecting Privacy in Software Logs: What Should be Anonymized?
Software logs, generated during the runtime of software systems, are essential for various development and analysis activities, such as anomaly detection and failure diagnosis. However, the presence of sensitive information in these logs poses significant privacy concerns, particularly regarding Personally Identifiable Information (PII) and quasi-identifiers that could lead to re-identification risks. While general data privacy has been extensively studied, the specific domain of privacy in software logs remains underexplored, with inconsistent definitions of sensitivity and a lack of standardized guidelines for anonymization. To mitigate this gap, this study offers a comprehensive analysis of privacy in software logs from multiple perspectives. We start by performing an analysis of 25 publicly available log datasets to identify potentially sensitive attributes. Based on the result of this step, we focus on three perspectives: privacy regulations, research literature, and industry practices. We first analyze key data privacy regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), to understand the legal requirements concerning sensitive information in logs. Second, we conduct a systematic literature review to identify common privacy attributes and practices in log anonymization, revealing gaps in existing approaches. Finally, we survey 45 industry professionals to capture practical insights on log anonymization practices. Our findings shed light on various perspectives of log privacy and reveal industry challenges, such as technical and efficiency issues while highlighting the need for standardized guidelines. By combining insights from regulatory, academic, and industry perspectives, our study aims to provide a clearer framework for identifying and protecting sensitive information in software logs.
Mon 23 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:00 - 15:30 | LoggingResearch Papers / Journal First at Andromeda Chair(s): Domenico Bianculli University of Luxembourg | ||
14:00 20mTalk | No More Labelled Examples? An Unsupervised Log Parser with LLMs Research Papers Junjie Huang The Chinese University of Hong Kong, Zhihan Jiang The Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Michael Lyu Chinese University of Hong Kong DOI | ||
14:20 20mTalk | Exploring the Effectiveness of LLMs in Automated Logging Statement Generation: An Empirical Study Journal First Yichen LI The Chinese University of Hong Kong, Yintong Huo Singapore Management University, Zhihan Jiang The Chinese University of Hong Kong, Renyi Zhong The Chinese University of Hong Kong, Pinjia He Chinese University of Hong Kong, Shenzhen, Yuxin Su Sun Yat-sen University, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland, Michael Lyu Chinese University of Hong Kong | ||
14:40 20mTalk | Protecting Privacy in Software Logs: What Should be Anonymized? Research Papers Roozbeh Aghili Polytechnique Montréal, Heng Li Polytechnique Montréal, Foutse Khomh Polytechnique Montréal DOI |
Andromeda is located close to the restaurant and the bar, at the end of the corridor on the side of the bar.
From the registration desk, go towards the restaurant, turn left towards the bar, walk until the end of the corridor.