DLBENCH: A Comprehensive Benchmark for SQL Translation with Large Language Models
In recent years, the growing complexity of database management systems (DBMSs) and the proliferation of SQL dialects have created significant challenges for database migration, federation, and integration. These challenges arise from the disparities between SQL dialects across different DBMSs, hindering seamless communication and system interoperability. SQL translation, the process of converting SQL queries from a source dialect DBMS to a target dialect DBMS, plays a crucial role in addressing these challenges. To facilitate this process, we introduce DLBENCH, the first comprehensive benchmark designed to evaluate the SQL translation capabilities of Large Language Models (LLMs). The benchmark includes two datasets: BIRDTRANS, which covers real-world database query scenarios across seven DBMSs, and BUTTERTRANS, which spans a broader spectrum of SQL types and encompasses extensive DBMS dialect features. We collect high-quality databases and SQL statements, applying a rigorous multi-step cleaning process that ensures data quality through SQL-92–based checks and dialect-specific parser validation. Additionally, both LLM-based and human annotations are used to guarantee the correctness and completeness of the dataset. We demonstrate the utility of DLBENCH through extensive experiments, which show that the benchmark effectively evaluates the SQL translation ability of LLMs. The results highlight the potential of LLMs for SQL translation tasks and provide insights into areas for further improvement.