Write a Blog >>

Biology is increasingly entering the fourth paradigm of science: tera/exabyte-scale data generation, with no single hypothesis in mind. These gigantic datasets are then searched for patterns that elucidate the biological processes that generated the measured data. The tools currently available to biologists, such as R and Python libraries, are not designed for datasets and algorithms that operate on ten thousand computer cloud clusters. Moreover, these libraries cannot be naively rewritten to leverage a distributed computing framework like Spark because these rich, high-dimensional datasets do not map well to the existing abstractions. In this talk, I’ll both describe the kinds of questions that the Biologists with massive datasets would like to ask and I’ll describe some of the tools my team is building to enable Statistical Genetics on datasets in the tens of terabytes.

Tue 20 Jun

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:25 - 12:45
Tuesday - 10:25 - 12:45 - AuditoriumCurry On Talks at Auditorium, Vertex Building
10:25
40m
Talk
Building Tools and Languages for Terabyte Scale Biology: A Call to Action
Curry On Talks
Daniel King Broad Institute
11:15
40m
Talk
Preventing Information Leaks by Construction
Curry On Talks
Jean Yang Carnegie Mellon University
12:05
40m
Talk
The Sharp Edges of Leaky Abstraction
Curry On Talks
Mark Allen Alert Logic