ML4PL 2018
Mon 16 - Sat 21 July 2018 Amsterdam, Netherlands
co-located with ECOOP and ISSTA 2018
Wed 18 Jul 2018 16:30 - 16:50 at Hanoi - Software Engineering & Compilers

For many experiments in code-centric research, researchers require real-world code to verify theories and conduct evaluations. Given the abundance of available code on platforms such as Maven Central or GitHub, real-world examples can be collected easily in large quantities. However, in order for the experiments to be meaningful, the inspected code needs to be representative for the inspected problem. For example, to properly evaluate a precise call-graph algorithm the code must contain complex virtual call sites where the strength and limitations of the algorithm can be observed. It is labor-intensive to set up these collections of code every time they become necessary. Furthermore, to increase the comparability and repeatability of the experiments, collections of code objects must be well curated so that their construction is traceable and repeatable.

New findings for these collections might invalidate their data (in parts) and related research should be inspected. Static collections (e.g., XCorpus) quickly become outdated and it is hard to annotate them once they are out in the field. During the usage of a collection, interesting data on the collection items might have been created or computed by other researchers. Algorithms (e.g., Averroes) might be available to compute information relevant for an evaluation on the fly thereby extend the original dataset. It is not easy to find this information and relate it to the the items in the collection.

To address these challenges in benchmark creation and maintenance, we introduce Delphi, an online platform to search for representative candidates to construct datasets of real-world code based on various metrics. It consists of an automated data collection, a search engine, and facilities to trace the selection process in order to foster repeatable, tractable, and comparable research. We present the current state of the project as well as our plans to extend the platform with processes for ground truth data uploads, service integration, and curated data invalidation.

Towards a Data-Curation Platform for Code-Centric Research (Slides) (Towards a Data-Curation Platform for Code-Centric Research.pdf)2.67MiB

Wed 18 Jul
Times are displayed in time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

16:00 - 17:40: Software Engineering & CompilersBenchWork at Hanoi
16:00 - 16:30
InspectorClone: Evaluating Precision of Clone Detection Tools
16:30 - 16:50
Towards a Data-Curation Platform for Code-Centric Research
Ben HermannUniversity of Paderborn, Lisa Nguyen Quang DoPaderborn University, Eric BoddenHeinz Nixdorf Institut, Paderborn University and Fraunhofer IEM
File Attached
16:50 - 17:10
The Architecture Independent Workload Characterization
Beau JohnstonAustralian National University
File Attached
17:10 - 17:40
Performance Monitoring in Eclipse OpenJ9