Centrifuge : Data quality in Spark without the costs!
Data quality is a growing concern in Big Data as more and more bugs are due to the lack of quality in data. However, data quality efforts come in second, and often too late. In this talk, we will apply algebraic abstraction to the composition of data pipelines resulting in inlined, unified and performant data quality checks. We will see how these techniques can be used to find different classes of bugs in pipelines and make “same day delivery” possible in production-critical projects.
Mon 19 Jun
|10:30 - 11:10|
Jon PrettyPropensive Ltd
|11:20 - 12:00|
|12:10 - 12:50|
Allison McMillanCollective Idea