The main objective of my visit to the ECOOP/ISSTA/CurryOn event is the organization of 2nd International Workshop on Machine Learning techniques for Programming Languages (ML4PL). The workshop consisted of one invited talk and 6 regular talks. Below I summarize the findings I found most interesting.
Andreas Zeller gave an invited talk on Inferring Input Structure for Machine Learning which presented the recent progress in learning generative grammars from sample inputs via dynamic instrumentation, as well as older results on grammar-directed fuzzy testing of JavaScript interpreters. It is outlined that the class of grammars that are feasible to learn is somewhere between regular languages and full-fledged programming languages, best approximated by data-description languages, e.g. JSON, URI, etc. The method can be easily implemented in Python (nearly an exercise), but becomes more involved for Java (recent work), and much harder for C (ongoing work). Also, some sample-less learning is expected to be implemented. As for the generative techniques, the author expects that his tooling will allow for automatic derivation of generator for an arbitrary ANTLR grammar.
Cristina Cifuentes gave a talk “Buffer Overflow Detection for C Programs is Hard to Learn” based on her (and coauthors) attempt to train several algorithms to detect buffer overflows. Overall, the experiment shows that pure syntactical models of code fail to detect possible overflows. Instead, as suggested by several reviewers of submission, it is likely required to use recent information flow-sensitive models (arXiv:1711.00740, arXiv:1803.09473).
Timofey Bryksin from “Machine Learning Methods in Software Engineering” lab in JetBrain Research presented an investigation of code anomalies in a large corpus of Kotlin code. The aim of the research is to find a code patterns that, first, are unusual (outstand by some syntactic metric) and, second, are not handled by the Kotlin compiler very well. This research applies several basic known anomaly detection approaches to the new field — detection of unusual source code patterns.
Overall, I think the workshop was successful, and I hope to see it at ECOOP next year!