Inputs from Hell: Learning Input Distributions for Grammar-Based Test Generation
When a program has been tested on some sample input(s), what additional input does one test next? To further test the program, one needs to construct inputs that cover (new) input features, in a manner that is different from the initial samples.
This paper presents a novel test generation approach that employs context-free grammars to learn the production probabilities of input elements from sample inputs. Using the grammars as input parsers, we show how to learn input distributions from sample inputs, allowing to create “common inputs” that are similar to the sample. By inverting the learned probabilities, we can create “uncommon inputs” that are dissimilar to the sample.
Our evaluation of these approaches on three input formats show that “common inputs” reproduced 96% of the methods induced by the samples and the “uncommon inputs” covered different methods from the samples for almost all subjects (95%).