Challenges to Achieving High Availability at Scale
Facebook is a social network that connects more than 1.8 billion people. To serve these many users requires infrastructure which is composed of thousands of interdependent systems that span geographically distributed data centers. But what is the guiding principle for building and operating these systems?
For Facebook’s infrastructure teams the answer is: Systems must always be available and never lose data. This talk will explore this quest. We will focus on three aspects.
Availability and consistency. What form of consistency do Facebook’s systems guarantee? Strong consistency makes understanding easy but has latency penalties, weak consistency is fast but difficult to reason for developers and users. We describe our usage of eventual consistency and delve into how Facebook constructs its caching and replicated storage systems to minimize the duration for achieving consistency. We share empirical data that measures the effectiveness of our design.
Availability and correctness. With network partitions, relaxed forms of consistency, and software bugs, how do we guarantee a consistent state? We present two systems to find and repair structural errors in Facebook’s social graph, one batch and one real-time.
Availability and scale. Sharding is one of the standard answers to operate at scale. But how can we develop one system that can shard storage as well as compute? We will introduce a new Sharding-as-a-Service component. We will show and evaluate how its design and service policies control for latency, failure tolerance and operationally efficiency.
Wolfram Schulte was director of engineering in Microsoft’s Cloud and Enterprise Division, Redmond, USA, where he founded the Tools for Software Engineers team to improve Microsoft’s engineering velocity, more specifically minimize the cycle time of the inner loop from code review, via build, code-analysis and test, to deployment. Before venturing into product groups, Wolfram lead the Research in Software Engineering (RiSE) group and worked for many tools that Microsoft ships, including Linq, CodeContracts, Task Parallel Library, IntelliTest and SpecExplorer. Wolfram also co-developed the experimental program verifiers Spec# and VCC. Wolfram is a recipient of the 2016 Mills Award.
Wed 21 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:00 - 15:00 | |||
14:00 60mTalk | Challenges to Achieving High Availability at Scale DEBS Invited Speakers Wolfram Shulte Facebook |