DepState: Detecting Synchronization Failure Bugs in Distributed Database Management Systems
DDBMSs are crucial for managing large-scale distributed data. Unlike single-node databases, they are deployed across clusters, distributing data among multiple nodes. The synchronization process is typically used in DDBMSs to maintain data consistency against data and cluster updates. Due to its complexity, bugs in the synchronization process are inevitable and can lead to failures. These failures may result in data inconsistencies, transaction errors, or even cluster crashes, all of which severely compromise the availability and reliability of DDBMSs. However, there has been relatively little focus on testing the DDBMS synchronization process.
In this paper, we propose DepState, a framework to detect synchronization failure bugs. DepState enhances the testing of synchronization processes by simulating the complexities of data sharding and the dynamic conditions of cluster environments. It establishes dependencies between tables across multiple nodes, closely reflecting real-world scenarios. Furthermore, the framework systematically introduces controlled variations in cluster states. We utilize DepState on three DDBMSs, namely MySQL NDB Cluster, MySQL InnoDB Cluster, and MariaDB Galera Cluster. DepState finds 22 new vulnerabilities, of which 11 have already been confirmed. We also compare DepState against state-of-the-art tools. DepState finds 11 more synchronization failure bugs and covers 37.5%-66.5%, 42.4%-83.3%, 36.8%-54.8%, and 27.8%-54.2% more lines in synchronization-related functions than Jepsen, SQLsmith, SQLancer, and Mozi in 24 hours.