Network partitions are inevitable in large-scale cloud systems. Despite developers’ efforts in handling network partitions throughout designing, implementing and testing cloud systems, bugs caused by network partitions, i.e., partition bugs, still exist and cause severe failures in production clusters. It is challenging to expose these partition bugs because they often require network partitions to start and stop at specific timings.
In this paper, we propose Consistency-Guided Fault Injection (CoFI), a novel technique that smartly injects network partitions to effectively expose partition bugs. We observe that, network partitions can leave cloud systems at inconsistent states, where partition bugs are more likely to occur. Based on this observation, CoFI first infers invariants (i.e., consistent states) among different nodes in a cloud system. Once observing a violation to the inferred invariants (i.e., inconsistent states) while running the cloud system, CoFI injects network partitions to prevent the cloud system from recovering back to consistent states, and thoroughly tests whether the cloud system still proceeds correctly at inconsistent states.We have applied CoFI to three widely-deployed cloud systems, i.e., Cassandra, HDFS, and YARN. CoFI has detected 7 previously-unknown bugs, and three of them have been confirmed by developers.
Wed 23 SepDisplayed time zone: (UTC) Coordinated Universal Time change
01:10 - 02:10 | Testing of Emerging ApplicationsResearch Papers / Tool Demonstrations at Wombat Chair(s): Yuan Tian Queens University, Kingston, Canada | ||
01:10 20mTalk | CoFI: Consistency-Guided Fault Injection for Cloud Systems Research Papers Haicheng Chen The Ohio State University, USA, Wensheng Dou Institute of Software, Chinese Academy of Sciences, Dong Wang Institute of software, Chinese academy of sciences, Feng Qin Ohio State University, USA | ||
01:30 20mTalk | ChemTest: An Automated Software Testing Framework for an Emerging Paradigm Research Papers Michael C. Gerten Iowa State University, James I. Lathrop Iowa State University, Myra Cohen Iowa State University, Titus H. Klinge Drake University Pre-print | ||
01:50 10mTalk | ImpAPTr: A Tool For Identifying The Clues To Online Service Anomalies Tool Demonstrations hao wang , Guoping Rong Nanjing University, Yangchen Xu Nanjing University, Yong You Meituan-Dianping Group |