Building Tools and Languages for Terabyte Scale Biology: A Call to Action
Biology is increasingly entering the fourth paradigm of science: tera/exabyte-scale data generation, with no single hypothesis in mind. These gigantic datasets are then searched for patterns that elucidate the biological processes that generated the measured data. The tools currently available to biologists, such as R and Python libraries, are not designed for datasets and algorithms that operate on ten thousand computer cloud clusters. Moreover, these libraries cannot be naively rewritten to leverage a distributed computing framework like Spark because these rich, high-dimensional datasets do not map well to the existing abstractions. In this talk, I’ll both describe the kinds of questions that the Biologists with massive datasets would like to ask and I’ll describe some of the tools my team is building to enable Statistical Genetics on datasets in the tens of terabytes.
Tue 20 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:25 - 12:45 | |||
10:25 40mTalk | Building Tools and Languages for Terabyte Scale Biology: A Call to Action Curry On Talks Daniel King Broad Institute | ||
11:15 40mTalk | Preventing Information Leaks by Construction Curry On Talks Jean Yang Carnegie Mellon University | ||
12:05 40mTalk | The Sharp Edges of Leaky Abstraction Curry On Talks Mark Allen Alert Logic |