ASE 2024
Sun 27 October - Fri 1 November 2024 Sacramento, California, United States

Context: R is a dynamic programming language designed for statistical computing. It provides many dynamic features that hinder the direct transfer of common static analyses. Additionally, there is a lack of existing, sophisticated static analyzers for R. Objective: In this work, we present a novel static dataflow analysis for R, together with a program slicer, as a proof of concept of its capabilities. Method: We propose a stateful fold over a normalized version of R’s abstract syntax tree. The fold tracks (re-)definitions, values, function calls, and side effects, handling multi-file projects and intertwining the control flow analysis to produce one graph per program. Evaluation: To validate our analysis, we applied it to 4103 parsable, real-world R scripts and 20815 packages on CRAN, measuring its runtime and memory performance. Additionally, we use a comprehensive set of 389 systematic tests we publish along with our tool. To indicate the effectiveness of the slicing, we measure the required time as well as the achieved reduction in code size. Results: Our analysis correctly analyzes all programs in our test suite. We require on average 124 ms to analyze the dataflow of a given file and around 100 kB for the dataflow graph. Our slicing implementation achieves an average reduction in tokens of 93.59 %.