Finding Data Compatibility Bugs with JSON Subschema CheckingDistinguished Artifact
Sat 17 Jul 2021 09:50 - 10:10 at ISSTA 1 - Session 27 (time band 3) Bugs and Analysis 2 Chair(s): Mike Papadakis
JSON is a data format used pervasively in web APIs, cloud computing,
NoSQL databases, and increasingly also machine learning.
To ensure that JSON data is compatible with an application, one
can define a JSON schema and use a validator to check data against
the schema. However, because validation can happen only once
concrete data occurs during an execution, it may detect data compatibility
bugs too late or not at all. Examples include evolving
the schema for a web API, which may unexpectedly break client
applications, or accidentally running a machine learning pipeline
on incorrect data. This paper presents a novel way of detecting
a class of data compatibility bugs via JSON subschema checking.
Subschema checks find bugs before concrete JSON data is available
and across all possible data specified by a schema. For example,
one can check if evolving a schema would break API clients or if
two components of a machine learning pipeline have incompatible
expectations about data. Deciding whether one JSON schema is
a subschema of another is non-trivial because the JSON Schema
specification language is rich. Our key insight to address this challenge
is to first reduce the richness of schemas by canonicalizing
and simplifying them, and to then reason about the subschema
question on simpler schema fragments using type-specific checkers.
We apply our subschema checker to thousands of real-world
schemas from different domains. In all experiments, the approach
is correct whenever it gives an answer (100% precision and correctness),
which is the case for most schema pairs (93.5% recall), clearly
outperforming the state-of-the-art tool. Moreover, the approach
reveals 43 previously unknown bugs in popular software, most
of which have already been fixed, showing that JSON subschema
checking helps finding data compatibility bugs early.
Slides (JSONSubschema_issta21_slides_online.pdf) | 713KiB |
Fri 16 JulDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
18:20 - 20:00 | Session 20 (time band 1) AnalysisTechnical Papers at ISSTA 2 Chair(s): Shiyi Wei University of Texas at Dallas | ||
18:20 20mTalk | A Lightweight Framework for Function Name Reassignment Based on Large-Scale Stripped BinariesACM SIGSOFT Distinguished Paper Technical Papers Han Gao University of Science and Technology of China, Shaoyin Cheng University of Science and Technology of China, Yinxing Xue University of Science and Technology of China, Weiming Zhang University of Science and Technology of China DOI | ||
18:40 20mTalk | Boosting Symbolic Execution via Constraint Solving Time Prediction (Experience Paper) Technical Papers Sicheng Luo Fudan University, Hui Xu Fudan University, Yanxiang Bi Fudan University, Xin Wang Fudan University, Yangfan Zhou Fudan University DOI File Attached | ||
19:00 20mTalk | Finding Data Compatibility Bugs with JSON Subschema CheckingDistinguished Artifact Technical Papers Andrew Habib SnT, University of Luxembourg, Avraham Shinnar IBM Research, Martin Hirzel IBM Research, Michael Pradel University of Stuttgart Link to publication DOI Pre-print File Attached | ||
19:20 20mTalk | SAND: A Static Analysis Approach for Detecting SQL AntipatternsACM SIGSOFT Distinguished Paper Technical Papers Yingjun Lyu Amazon, Sasha Volokh University of Southern California, William G.J. Halfond University of Southern California, Omer Tripp Amazon DOI | ||
19:40 20mTalk | Automated Patch Backporting in Linux (Experience Paper)Distinguished Artifact Technical Papers Ridwan Salihin Shariffdeen National University of Singapore, Xiang Gao National University of Singapore, Gregory J. Duck National University of Singapore, Shin Hwei Tan Southern University of Science and Technology, Julia Lawall Inria, Abhik Roychoudhury National University of Singapore DOI Pre-print Media Attached |
Sat 17 JulDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
09:30 - 11:10 | Session 27 (time band 3) Bugs and Analysis 2Technical Papers at ISSTA 1 Chair(s): Mike Papadakis University of Luxembourg, Luxembourg | ||
09:30 20mTalk | Faster, Deeper, Easier: Crowdsourcing Diagnosis of Microservice Kernel Failure from User Space Technical Papers Yicheng Pan Peking University, Meng Ma Peking University, Xinrui Jiang Peking University, Ping Wang Peking University DOI Media Attached File Attached | ||
09:50 20mTalk | Finding Data Compatibility Bugs with JSON Subschema CheckingDistinguished Artifact Technical Papers Andrew Habib SnT, University of Luxembourg, Avraham Shinnar IBM Research, Martin Hirzel IBM Research, Michael Pradel University of Stuttgart Link to publication DOI Pre-print File Attached | ||
10:10 20mTalk | Semantic Table Structure Identification in Spreadsheets Technical Papers Yakun Zhang Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Xiao Lv Microsoft Research, Haoyu Dong Microsoft Research, Wensheng Dou Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shi Han Microsoft Research, Dongmei Zhang Microsoft Research, Jun Wei Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Dan Ye Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences DOI Media Attached | ||
10:30 20mTalk | Deep Just-in-Time Defect Prediction: How Far Are We? Technical Papers Zhengran Zeng Southern University of Science and Technology, Yuqun Zhang Southern University of Science and Technology, Haotian Zhang Kwai, Lingming Zhang University of Illinois at Urbana-Champaign DOI | ||
10:50 20mTalk | Continuous Test Suite Failure Prediction Technical Papers DOI Media Attached |