Towards Automatically Inferring Constraints to Identify Implicit Assumptions in Data Analysis
High-level languages such as R or Python are used frequently to analyze and visualize data in the form of scripts or notebooks. However, these artifacts suffer from reproducibility issues due to what we frame as implicit assumptions made by the authors. Such assumptions range from package versions and shapes of involved data tables, to manual and often undocumented setup steps. Within this work, we provide a unified, example-driven perspective on implicit assumptions in data analysis backed by an explorative proof-of-concept implementation. With this perspective, we propose the use of static analysis techniques to identify such assumptions and to make them explicit in the form of code constraints, focusing on the inclusion of data-analysis specific issues. Such constraints can then be used to automatically transform these scripts into executable and reproducible artifacts, to check these assumptions at runtime, and to serve as documentation to support code reuse and comprehension.
| Slides (no animations) (nier-26-sihler-identify-implicit-assumptions.pdf) | 5.88MiB |
Thu 16 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
16:00 - 17:30 | Testing and Analysis 13New Ideas and Emerging Results (NIER) / Software Engineering Education and Training (SEET) / Journal-first Papers at Oceania II Chair(s): Lei Zhang University of Maryland Baltimore County | ||
16:00 15mTalk | How to Save My Gas Fees: Understanding and Detecting Real-World Gas Issues in Solidity Programs Journal-first Papers Mengting He The Pennsylvania State University, Shihao Xia The Pennsylvania State University, Boqin Qin China Telecom Cloud Computing Corporation, Nobuko Yoshida University of Oxford, Tingting Yu University of Connecticut, Yiying Zhang University of California San Diego, Linhai Song The Pennsylvania State University | ||
16:15 15mTalk | Reasoning About Bugs in Learners’ Scratch Programs Using Large Language Models Software Engineering Education and Training (SEET) Benedikt Fein University of Passau, Patric Feldmeier University of Passau, Florian Obermueller University of Passau, Gordon Fraser University of Passau DOI Pre-print | ||
16:30 15mTalk | Characterizing and Refactoring Table-Driven Tests in Go New Ideas and Emerging Results (NIER) Max Green Stevens Institute of Technology, Lu Xiao Stevens Institute of Technology, Zhongpeng Lin Uber Technologies Inc. | ||
16:45 15mTalk | Data-aware Static Analysis: Improving Semantic Fault Detection in Machine Learning Code Using Data Characteristics New Ideas and Emerging Results (NIER) Willem Meijer Linköping University, Kristian Sandahl Linköping University, Daniel Varro Linköping University / McGill University File Attached | ||
17:00 15mTalk | Towards Automatically Inferring Constraints to Identify Implicit Assumptions in Data Analysis New Ideas and Emerging Results (NIER) Florian Sihler Ulm University, Lars Pfrenger Ulm University, Oliver Gerstl Ulm University, Matthias Tichy Ulm University DOI File Attached | ||
17:15 15mTalk | QSolver: A Quantum Constraint Solver New Ideas and Emerging Results (NIER) | ||