Grammar-Based Testing for Little Languages: An Experience Report with Student Compilers
We report on our experience in using various grammar-based test suite
generation methods to test 61 single-pass compilers that undergraduate
students submitted for the practical project of a computer
We show that
(1) all test suites constructed systematically following different
grammar coverage criteria fall far behind the instructor's test suite
in achieved code coverage, in the number of triggered semantic errors,
and in detected failures and crashes;
(2) a medium-sized positive random test suite triggers more crashes
than the instructor's test suite, but achieves lower code coverage and
triggers fewer non-crashing errors;
(3) a combination of the systematic and random test suites performs
as well or better than the instructor's test suite in all aspects and
identifies errors or crashes in every single submission.
We then develop a light-weight extension of the basic grammar-based testing
framework to capture contextual constraints, by encoding scoping and
typing information as ``semantic mark-up tokens'' in the grammar rules.
These mark-up tokens are interpreted by a small generic core engine
when the tests are rendered, and tests with a
syntactic structure that cannot be completed into a valid program by
choosing appropriate identifiers are discarded.
We formalize individual error models by overwriting individual mark-up tokens,
and generate tests that are guaranteed to break specific contextual
properties of the language. We show that a fully automatically
generated random test suite with 15 error models achieves roughly the
same coverage as the instructor's test suite, and outperforms it in the
number of triggered semantic errors and detected failures and crashes.
Moreover, all failing tests indicate real errors, and we have
detected errors even in the instructor's reference implementation.