Agile Construction of Data Science DSLs (Tool Demo) (GPCE 2019 - - 18th International Conference on Generative Programming: Concepts & Experiences)

Who

Artur Andrzejak, Kevin Kiefer, Diego Elias Costa, Oliver Wenz

Track

GPCE 2019

Time Zone

The program is currently displayed in (GMT+03:00) Beirut.

Use conference time zone: (GMT+03:00) BeirutSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 21 Oct 2019 12:10 - 12:30 at Ground floor conference room - Language extension Chair(s): Adam Welc

Abstract

Domain Specific Languages (DSLs) have proven useful in the domain of data science, as witnessed by the popularity of SQL, and emergence of DSLs like Trifacta Data Wrangler Language. However, implementing and maintaining a DSL incurs a significant effort which limits their utility in context of fast-changing data science frameworks and libraries.

We propose an approach and a Python-based library/tool NLDSL which simplifies and streamlines implementation of DSLs modeling pipelines of operations. NLDSL offers an “easy to use” interface to add new pipeline operations. In particular, syntax description and operation implementation are bundled together as annotated and terse Python functions, which simplifies extending and maintaining a DSL. To support ad hoc DSL elements, NLDSL offers a mechanism to define DSL-level functions which are then treated as first-class DSL elements.

Our tool automatically supports each DSL by code completions and in-editor documentation in a multitude of IDEs implementing the Microsoft’s Language Server Protocol. To circumvent the problem of a limited expressiveness of a external DSL, our tool allows embedding DSL statements in the source code comments of a general purpose language and to translate the DSL to such a language during editing.

We demonstrate and evaluate our approach and tool by implementing a DSL for data tables which is translated to either Pandas or to PySpark code. A preliminary evaluation shows that this DSL can be defined in a concise and maintainable way, and that it can cover a majority of processing steps of popular Spark/Pandas tutorials.

Artur Andrzejak

Heidelberg University

Germany

Kevin Kiefer

Diego Elias Costa

Heidelberg University

Oliver Wenz

Heidelberg University

Time Zone

The program is currently displayed in (GMT+03:00) Beirut.

Use conference time zone: (GMT+03:00) BeirutSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 21 Oct
Displayed time zone: Beirut change

11:00 - 12:30	Language extensionGPCE 2019 at Ground floor conference room Chair(s): Adam Welc Uber Technologies

11:00 30m Talk		Foreign language interfaces by code migration GPCE 2019 Shigeru Chiba Graduate School of Information Science and Technology, The University of Tokyo
11:30 20m Talk		A Language Feature to Unbundle Data at Will (Short Paper) GPCE 2019 Musa Al-hassy McMaster University, Wolfram Kahl McMaster University, Jacques Carette McMaster University
11:50 20m Talk		Parallel Nondeterministic Programming as a Language Extension to C (Short Paper) GPCE 2019 Lucas Kramer University of Minnesota, Eric Van Wyk University of Minnesota, USA DOI Pre-print
12:10 20m Talk		Agile Construction of Data Science DSLs (Tool Demo) GPCE 2019 Artur Andrzejak Heidelberg University, Kevin Kiefer , Diego Elias Costa Heidelberg University, Oliver Wenz Heidelberg University