Twitter runs a large-scale SQL federation system to fulfill the increasing need for data analytics alongside high scalability and availability. Recently, with Twitter’s efforts in migrating ad-hoc clusters to the cloud, we evolved the SQL system into a hybrid-cloud SQL federation system, across Twitter’s data centers and the public cloud, interacting with around 10PB of data daily.
In this paper, we present the design of the hybrid-cloud SQL federation system, including query federation, cluster federation, and storage federation. We identify challenges in a modern SQL system and how our system helps to address them with some important design decisions. Finally, we reflect on a qualitative examination of lessons learned from the development and maintenance of such a SQL system.