Open Science in Software Engineering: A Study on Deep Learning-Based Vulnerability Detection
Open science is a practice that makes scientific research publicly accessible to anyone, hence is highly beneficial. Given the benefits, the software engineering (SE) community has been diligently advocating open science policies during peer reviews and publication processes. However, to this date, there has been few studies that look into the status and issues of open science in SE from a systematic perspective.
In this paper, we set out to start filling this gap. Given the great breadth of SE in general, we constrained our scope to a particular topic area in SE as an example case. Recently, an increasing number of deep learning (DL) approaches have been explored in SE, including DL-based software vulnerability detection, a popular, fast-growing topic that addresses an important problem in software security. We exhaustively searched the literature in this area and identified 55 relevant works that propose a DL-based vulnerability detection approach. This was then followed by comprehensively investigating the four integral aspects of open science: availability, executability, reproducibility, and replicability.
Among other findings, our study revealed that only a small percentage (25.5%) of the studied approaches provided publicly available tools. Some of these available tools did not provide sufficient documentation and complete implementation, making them not executable or not reproducible. The uses of balanced or artificially generated datasets caused significantly overrated performance of the respective techniques, making most of them not replicable. Based on our empirical results, we made actionable suggestions on improving the state of open science in each of the four aspects. We note that our results and recommendations on most of these aspects (availability, executability, reproducibility) are not tied to the nature of the chosen topic (DL-based vulnerability detection) hence are likely applicable to other SE topic areas. We also believe our results and recommendations on replicability to be applicable to other DL-based topics in SE as they are not tied to (the particular application of DL in) detecting software vulnerabilities.
Wed 17 MayDisplayed time zone: Hobart change
15:45 - 17:15 | SE for security 1Technical Track / SEET - Software Engineering Education and Training / Journal-First Papers / SEIS - Software Engineering in Society at Meeting Room 103 Chair(s): Abhik Roychoudhury National University of Singapore | ||
15:45 15mTalk | TAINTMINI: Detecting Flow of Sensitive Data in Mini-Programs with Static Taint Analysis Technical Track Chao Wang , Ronny Ko The Ohio State University, Yue Zhang The Ohio State University, Yuqing Yang The Ohio State University, Zhiqiang Lin The Ohio State University | ||
16:00 15mTalk | AChecker: Statically Detecting Smart Contract Access Control Vulnerabilities Technical Track Asem Ghaleb University of British Columbia, Julia Rubin University of British Columbia, Canada, Karthik Pattabiraman University of British Columbia | ||
16:15 15mTalk | Fine-grained Commit-level Vulnerability Type Prediction By CWE Tree Structure Technical Track Shengyi Pan Zhejiang University, Lingfeng Bao Zhejiang University, Xin Xia Huawei, David Lo Singapore Management University, Shanping Li Zhejiang University Pre-print | ||
16:30 15mPaper | Security Thinking in Online Freelance Software Development SEIS - Software Engineering in Society Irum Rauf The Open University, UK, Marian Petre School of Computing and Communications, The Open University, UK, Thein Tun School of Computing and Communications,The Open University, UK; Simply Business, UK, Tamara Lopez The Open University, Bashar Nuseibeh The Open University, UK; Lero, University of Limerick, Ireland | ||
16:45 7mTalk | Open Science in Software Engineering: A Study on Deep Learning-Based Vulnerability Detection Journal-First Papers Yu Nong Washington State University, Rainy Sharma Washington State University, Wahab Hamou-Lhadj Concordia University, Montreal, Canada, Xiapu Luo The Hong Kong Polytechnic University, Haipeng Cai Washington State University Link to publication DOI Authorizer link Pre-print | ||
16:52 8mTalk | Training for Security: Planning the Use of a SAT in the Development Pipeline of Web Apps SEET - Software Engineering Education and Training Sabato Nocera University of Salerno, Simone Romano University of Salerno, Rita Francese University of Salerno, Giuseppe Scanniello University of Salerno | ||
17:00 15mTalk | VulGen: Realistic Vulnerability Generation Via Pattern Mining and Deep Learning Technical Track Yu Nong Washington State University, Yuzhe Ou University of Texas at Dallas, Michael Pradel University of Stuttgart, Feng Chen University of Texas at Dallas, Haipeng Cai Washington State University Pre-print |