Flare: A Brief Look into Optimizing UDFs in Spark (Student Talk)
Building performance-critical pieces of software is a time-consuming, error-prone, and often tedious task. Despite the fact that programmers often get lost in the weeds of lower-level languages like C, these languages remain the industry standard for building any kind of large-scale, “enterprise” software with performance constraints. Notably, most enterprise-level database management systems are written in these low-level languages in the hopes of eking out the last drops of hardware performance. However, is the payoff really worth the pain?
This talk presents Flare: a new back-end for Spark, that shows it is possible to meet or even exceed the performance of existing relational database management systems. Flare is implemented entirely in Scala, but boasts order of magnitude speedups both for relational workloads such as the TPC-H benchmarks, as well as for a range of machine learning kernels that combine relational and iterative functional processing.
These gains are achieved primarily through compilation of SQL queries to native code, replacing parts of the Spark runtime system, and extending the scope of optimization and code generation to large classes of user-defined functions (UDFs). This talk focuses on Flare’s support for reasoning about UDFs, utilizing multistaged programming to enable Scala’s compiler to efficiently optimize what are otherwise treated as black box computations.
Sun 22 Oct
|15:30 - 15:52|
|15:52 - 16:15|
|16:15 - 16:37|
|16:37 - 17:00|