"What makes my queries slow?": Subgroup Discovery for SQL Workload Analysis
Among daily tasks of database administrators (DBAs), the analysis of query workloads for identifying schema issues and improving performances is crucial. Although DBAs can easily identify queries that repeatedly cause performance issues, it remains challenging to automatically identify subsets of queries that share some properties only (a pattern) and foster at the same time some target measures, such as execution time. Patterns are defined on combinations of query clause, environment variables, database alerts and metrics and help answer questions like what makes SQL queries slow? What makes I/O communications high? Automatically discovering these patterns in a huge search space and providing them as hypotheses for helping DBAs to localize issues and root-causes is an actual problem for explainable AI. To tackle it, we introduce an original approach rooted on Subgroup Discovery. We show how to instantiate and develop this generic data-mining framework to identify potential causes of SQL workloads issues. We believe indeed that such data-mining technique is not trivial to apply for DBAs. As such, we also provide a visualization tool for interactive knowledge discovery. We analyse a one week workload from hundreds of databases from our company, make both the dataset and source code available, and experimentally show that insightful hypotheses can be discovered.