Flare: A Brief Look into Optimizing UDFs in Spark (Student Talk) (Scala 2017)

Track

Scala 2017

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 22 Oct 2017 15:52 - 16:15 at Regency C - Open-source & Student Talks Chair(s): Guido Salvaneschi

Abstract

Building performance-critical pieces of software is a time-consuming, error-prone, and often tedious task. Despite the fact that programmers often get lost in the weeds of lower-level languages like C, these languages remain the industry standard for building any kind of large-scale, “enterprise” software with performance constraints. Notably, most enterprise-level database management systems are written in these low-level languages in the hopes of eking out the last drops of hardware performance. However, is the payoff really worth the pain?

This talk presents Flare: a new back-end for Spark, that shows it is possible to meet or even exceed the performance of existing relational database management systems. Flare is implemented entirely in Scala, but boasts order of magnitude speedups both for relational workloads such as the TPC-H benchmarks, as well as for a range of machine learning kernels that combine relational and iterative functional processing.

These gains are achieved primarily through compilation of SQL queries to native code, replacing parts of the Spark runtime system, and extending the scope of optimization and code generation to large classes of user-defined functions (UDFs). This talk focuses on Flare’s support for reasoning about UDFs, utilizing multistaged programming to enable Scala’s compiler to efficiently optimize what are otherwise treated as black box computations.

Flare: A Brief Look into Optimizing UDFs in Spark

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 22 Oct
Displayed time zone: Tijuana, Baja California change

15:30 - 17:00	Open-source & Student TalksScala 2017 at Regency C Chair(s): Guido Salvaneschi TU Darmstadt

15:30 22m Talk		Genomic Data Analysis in Scala (Open-Source Talk) Scala 2017 Ryan Williams Media Attached
15:52 22m Talk		Flare: A Brief Look into Optimizing UDFs in Spark (Student Talk) Scala 2017 A: James Decker Media Attached
16:15 22m Talk		Delimited Control in Scala (Student Talk) Scala 2017 A: Nils Jonsson
16:37 22m Talk		Design of Library Interfaces (Student Talk) Scala 2017 A: Nils Jonsson

Flare: A Brief Look into Optimizing UDFs in Spark (Student Talk)

Sun 22 Oct
Displayed time zone: Tijuana, Baja California change

James DeckerAuthor

Tracks

Flare: A Brief Look into Optimizing UDFs in Spark (Student Talk)

Program Display Configuration

Program Display Configuration

Sun 22 OctDisplayed time zone: Tijuana, Baja California change

James DeckerAuthor

Sun 22 Oct
Displayed time zone: Tijuana, Baja California change