Big Data Software Analytics with Apache Spark (* ICSE 2018 * - TB - Technical Briefings )

Sun 27 May - Sun 3 June 2018 Gothenburg, Sweden

Track

* ICSE 2018 * TB - Technical Briefings

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 29 May 2018 09:00 - 10:30 at R14 - Big data and Machine Learning

Abstract

At the beginning of every research effort, researchers in empirical software engineering have to go through the processes of extracting data from raw data sources and transforming them to what their tools expect as inputs. This step is time consuming and error prone, while the produced artifacts (code, intermediate datasets) are usually not of scientific value. In recent years, Apache Spark has emerged as a solid foundation for data science and has taken the big data analytics domain by storm. With Spark, researchers can map their data sources into immutable lists or data frames and transform them using a declarative API based on functional programming primitives. The primitives exposed by the Apache Spark API can help software engineering researchers create and share reproducible, high-performance data analysis pipelines, that automatically scale processing to clusters of machines.

This technical briefing will cover the following topics:

Functional programming basics: what is map? What is fold? What does group by and join do?
Apache Spark in a nutshell: what are RDDs and what are Dataframes? How can we query any dataset with SQL?
Present a live demo of applying Apache Spark on a software engineering task.

The speaker has extensive experience in applying big data technologies on software engineering data, and has been teaching Apache Spark to BSc and MSc students.

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 29 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

09:00 - 12:30	Big data and Machine LearningTB - Technical Briefings at R14

09:00 90m Talk		Big Data Software Analytics with Apache Spark TB - Technical Briefings Georgios Gousios TU Delft
11:00 90m Talk		Machine Learning for Software Engineering: Models, Methods, and Applications TB - Technical Briefings Karl Meinke , Amel Bennaceur The Open University

Big Data Software Analytics with Apache Spark

Tue 29 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

Georgios Gousios

TU Delft

Tracks

Co-hosted Conferences

Co-hosted Symposia

Big Data Software Analytics with Apache Spark

Program Display Configuration

Program Display Configuration

Tue 29 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

Georgios Gousios

TU Delft

Tue 29 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change