How Much Logs Does My Source Code File Need? Learning to Predict the Density of Logs (EASE 2024 - Research Papers)

Who

Mohamed Amine Batoun, Mohammed Sayagh, Ali Ouni

Track

EASE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 20 Jun 2024 14:15 - 14:30 at Room Capri - Artificial Intelligence for Software Engineering Chair(s): Sridhar Chimalakonda, Klaus Schmid

Abstract

Software logging is the practice of recording different events that occur within a software system, which are useful for several activities such as the analysis of the system behaviour, failure prediction and anomaly detection. However, determining the optimal location for such logging statements is a critical yet complex task. In fact, striking the right balance between logging and system overhead is challenging. That is, insufficient logging can make different maintenance tasks difficult due to missing crucial system execution data, while excessive logging can mask the real issues and cause notable performance overhead. Prior work has conducted various machine learning-based solutions to suggest where to insert logging statements. But most importantly, before answering the question ``where to log?’’, practitioners first need to determine whether a file needs logging at the first place. To do so, we conduct in this paper an empirical study to characterize the log density (i.e., ratio of log lines over the total lines of code) in seven open-source software projects. Then, we propose a deep learning based approach to predict the log density based on syntactic and semantic features of the source code. We find that the percentage of files with at least one log line ranges from 5% to 33% across the studied projects. Additionally, the median log density in the files with at least one log line ranges from 0.95% to 1.85% across the seven projects and can go up to 18%. Furthermore, files without logs are less maintained and tend to have a lower median number of bugs compared to files with logs. Our findings resonate with the hypothesis that not all source code files require logging. On the other hand, our log density models achieve an average accuracy of 84%. Whereas our cross-project log density prediction results show a promising performance with an average accuracy of 72%, which represents over 86% (ratio of cross/within) of the corresponding within-project predictions using syntactic features. Our results show that we can accurately predict whether a file needs logging and such predictions may be generalized across projects.

Mohamed Amine Batoun

École de Technologie Supérieure

Canada

Mohammed Sayagh

ETS Montreal, University of Quebec

Canada

Ali Ouni

ETS Montreal, University of Quebec

Canada

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 20 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:25	Artificial Intelligence for Software EngineeringIndustry / Research Papers / Short Papers, Vision and Emerging Results at Room Capri Chair(s): Sridhar Chimalakonda Indian Institute of Technology, Tirupati, Klaus Schmid University of Hildesheim

14:00 15m Talk		A Performance Study of LLM-Generated Code on Leetcode Research Papers Tristan Coignion , Clément Quinton University of Lille, Inria, Romain Rouvoy Univ. Lille / Inria / CNRS Pre-print
14:15 15m Talk		How Much Logs Does My Source Code File Need? Learning to Predict the Density of Logs Research Papers Mohamed Amine Batoun École de Technologie Supérieure, Mohammed Sayagh ETS Montreal, University of Quebec, Ali Ouni ETS Montreal, University of Quebec
14:30 15m Talk		The Promise and Challenges of using LLMs to Accelerate the Screening Process of Systematic Reviews Research Papers Aleksi Huotala University of Helsinki, Miikka Kuutila Dalhousie University, Paul Ralph Dalhousie University, Mika Mäntylä University of Helsinki and University of Oulu Link to publication DOI Pre-print
14:45 15m Talk		AI-enabled efficient PVM performance monitoring Industry Mario Veniero Independent Researcher, Davide Varriale MEDIACOM SRL DOI
15:00 15m Talk		Automated evaluation of game content display using deep learning Industry Ciprian Paduraru University of Bucharest, Marina Cernat University of Bucharest, Alin Stefanescu University of Bucharest
15:15 10m Talk		Automated categorization of pre-trained models in software engineering: A case study with a Hugging Face dataset Short Papers, Vision and Emerging Results Claudio Di Sipio University of L'Aquila, Riccardo Rubei University of L'Aquila, Juri Di Rocco University of L'Aquila, Davide Di Ruscio University of L'Aquila, Phuong T. Nguyen University of L’Aquila Pre-print