When to Say What: Learning to Find Condition-Message Inconsistencies
Thu 18 May 2023 15:26 - 15:28 at Meeting Room 105 - Posters 2
Programs often emit natural language messages, e.g., in logging statements or exceptions raised on unexpected paths. To be meaningful to users and developers, the message, i.e., \emph{what} to say, must be consistent with the condition under which it gets trigger, i.e., \emph{when} to say it. However, checking for inconsistencies between conditions and messages is challenging because the conditions are expressed in the logic of the programming language, while messages are informally expressed in natural language. This paper presents CMI-Finder, an approach for detecting \emph{condition-message inconsistencies}. CMI-Finder is based on a neural model that takes a condition and a message as its input and then predicts whether the two are consistent. To address the problem of obtaining realistic, diverse, and large-scale training data, we present six techniques to generate large numbers of inconsistent examples to learn from automatically. Moreover, we describe and compare three neural models, which are based on binary classification, triplet loss, and fine-tuning, respectively. Our evaluation applies the approach to 300K condition-message statements extracted from 42 million lines of Python code. The best model achieves a precision of 78% at a recall of 72% on a dataset of past bug fixes. Applying the approach to the newest versions of popular open-source projects reveals 50 previously unknown bugs, eight of which have been confirmed by the developers so far.
Wed 17 MayDisplayed time zone: Hobart change
15:45 - 17:15 | Software loggingTechnical Track at Meeting Room 101 Chair(s): Hongyu Zhang The University of Newcastle | ||
15:45 15mTalk | PILAR: Studying and Mitigating the Influence of Configurations on Log Parsing Technical Track Hetong Dai Concordia University, Yiming Tang Concordia University, Heng Li Polytechnique Montréal, Weiyi Shang University of Waterloo | ||
16:00 15mTalk | Did We Miss Something Important? Studying and Exploring Variable-Aware Log Abstraction Technical Track Zhenhao Li Concordia University, Chuan Luo Beihang University, Tse-Hsun (Peter) Chen Concordia University, Weiyi Shang University of Waterloo, Shilin He Microsoft Research, Qingwei Lin Microsoft Research, Dongmei Zhang Microsoft Research | ||
16:15 15mTalk | On the Temporal Relations between Logging and Code Technical Track Zishuo Ding Concordia University, Yiming Tang Concordia University, Yang Li Beijing University of Posts and Telecommunications, Heng Li Polytechnique Montréal, Weiyi Shang University of Waterloo Pre-print | ||
16:30 15mTalk | How Do Developers' Profiles and Experiences Influence their Logging Practices? An Empirical Study of Industrial Practitioners Technical Track Guoping Rong Nanjing University, shenghui gu Nanjing University, Haifeng Shen Australian Catholic University, He Zhang Nanjing University, Hongyu Kuang Nanjing University | ||
16:45 15mTalk | When to Say What: Learning to Find Condition-Message Inconsistencies Technical Track Pre-print | ||
17:00 15mTalk | A Semantic-aware Parsing Approach for Log Analytics Technical Track Yintong Huo The Chinese University of Hong Kong, Yuxin Su Sun Yat-sen University, Cheryl Lee The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong Pre-print |