Tests from Traces: Automated Unit Test Generation for R
Unit tests are labor-intensive to write and maintain. This paper looks into the possibility of automatically generating unit tests for a target software package from the execution traces of client code and examples. Our objective is to reduce the effort involved in creating test suites while minimizing the number and size of individual tests, and maximizing coverage. To evaluate the viability of our approach, we select a challenging target for automated test generation, namely R, a programming language that is popular for data science applications. The challenges presented by R are its extreme dynamism and lack of types. This combination decrease the efficacy of traditional test generation techniques. We present Genthat, a tool developed over the last couple of years to non-invasively record execution traces of R programs and extract unit tests from those traces. We have carried out an evaluation on 1,547 packages comprising 1.7M lines of R code. The tests generated by Genthat improved code coverage from the original rather low value of 267,113 lines to 704,450 lines.