SecretBench: A Dataset of Software Secrets
According to GitGuardian’s monitoring of public GitHub repositories, the exposure of secrets (API keys and other credentials) increased two-fold in 2021 compared to 2020, totaling more than six million secrets. However, no benchmark dataset is publicly available for researchers and tool developers to evaluate secret detection tools that produce many false positive warnings. The goal of our paper is to aid researchers and tool developers in evaluating and improving secret detection tools by curating a benchmark dataset of secrets through a systematic collection of secrets from open-source repositories. We present a labeled dataset of source codes containing 97,479 secrets (of which 15,084 are true secrets) of various secret types extracted from 818 public GitHub repositories. The dataset covers 49 programming languages and 311 file types.
Mon 15 MayDisplayed time zone: Hobart change
16:35 - 17:20 | SecurityTechnical Papers / Data and Tool Showcase Track at Meeting Room 110 Chair(s): Chanchal K. Roy University of Saskatchewan | ||
16:35 12mTalk | UNGOML: Automated Classification of unsafe Usages in Go Technical Papers Anna-Katharina Wickert TU Darmstadt, Germany, Clemens Damke University of Munich (LMU), Lars Baumgärtner Technische Universität Darmstadt, Eyke Hüllermeier University of Munich (LMU), Mira Mezini TU Darmstadt Pre-print File Attached | ||
16:47 12mTalk | Connecting the .dotfiles: Checked-In Secret Exposure with Extra (Lateral Movement) Steps Technical Papers Gerhard Jungwirth TU Wien, Aakanksha Saha TU Wien, Michael Schröder TU Wien, Tobias Fiebig Max-Planck-Institut für Informatik, Martina Lindorfer TU Wien, Jürgen Cito TU Wien Pre-print | ||
16:59 12mTalk | MANDO-HGT: Heterogeneous Graph Transformers for Smart Contract Vulnerability Detection Technical Papers Hoang H. Nguyen L3S Research Center, Leibniz Universität Hannover, Hannover, Germany, Nhat-Minh Nguyen Singapore Management University, Singapore, Chunyao Xie L3S Research Center, Leibniz Universität Hannover, Germany, Zahra Ahmadi L3S Research Center, Leibniz Universität Hannover, Hannover, Germany, Daniel Kudenko L3S Research Center, Leibniz Universität Hannover, Germany, Thanh-Nam Doan Independent Researcher, Atlanta, Georgia, USA, Lingxiao Jiang Singapore Management University Pre-print Media Attached | ||
17:11 6mTalk | SecretBench: A Dataset of Software Secrets Data and Tool Showcase Track Setu Kumar Basak North Carolina State University, Lorenzo Neil North Carolina State University, Bradley Reaves North Carolina State University, Laurie Williams North Carolina State University Pre-print |