Write a Blog >>

Biology is increasingly entering the fourth paradigm of science: tera/exabyte-scale data generation, with no single hypothesis in mind. These gigantic datasets are then searched for patterns that elucidate the biological processes that generated the measured data. The tools currently available to biologists, such as R and Python libraries, are not designed for datasets and algorithms that operate on ten thousand computer cloud clusters. Moreover, these libraries cannot be naively rewritten to leverage a distributed computing framework like Spark because these rich, high-dimensional datasets do not map well to the existing abstractions. In this talk, I’ll both describe the kinds of questions that the Biologists with massive datasets would like to ask and I’ll describe some of the tools my team is building to enable Statistical Genetics on datasets in the tens of terabytes.

Tue 20 Jun
Times are displayed in time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:25 - 12:45: Tuesday - 10:25 - 12:45 - AuditoriumCurry On Talks at Auditorium, Vertex Building
10:25 - 11:05
Talk
Curry On Talks
Daniel KingBroad Institute
11:15 - 11:55
Talk
Curry On Talks
Jean YangCarnegie Mellon University
12:05 - 12:45
Talk
Curry On Talks
Mark AllenAlert Logic