AI-Assisted SQL Authoring at Industry Scale (ICSE 2025 - Software Engineering in Practice (SEIP))

Who

Chandra Sekhar Maddila, Negar Ghorbani, Kosay Jabre, Vijayaraghavan Murali, Edwin Kim, Parth Thakkar, Nikolay Pavlovich Laptev, Olivia Harman, Diana Hsu, Rui Abreu, Peter C Rigby

Track

ICSE 2025 SE In Practice (SEIP)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 1 May 2025 12:00 - 12:15 at 203 - Design for AI Chair(s): Chunyang Chen

Abstract

SqlCompose brings generative AI into the data analytics domain. SQL is declarative, has formal table schemas, and is often written in a non-linear manner. We address each of these challenges and develop a set of models that shows the importance of each problem. We first develop an internal SQL benchmark to perform offline tests at Meta. We evaluate how well the Public Llama model performs. We attain a BLEU score of 53% and 24% for single- and multi-line predictions, respectively. This performance is consistent with prior works on imperative languages. We then fine-tune Llama on our internal data and database schemas. SC-Schema substantially outperforms Llama by 16 percentage points on BLEU score. SQL is often written with multiple sub queries and in a non-sequential manner. We develop SC-FIM which is aware of the context before and after the line(s) that need to be completed. This fill-in-the-middle model outperform SC-FIM by 35 percentage points. We also measure how often the models get the correct table names, and SC-FIM is able to do this 75% of the time a major improvement over the other two models. Aside from our scientific research, we also roll out SC-FIM at Meta. SqlCompose is used on a weekly basis by over 10k users including data scientists and software engineers, less than 1% of users have disabled SqlCompose. We use the feedback from users to improve SqlCompose. Interesting positive themes include completing tedious or repetitive SQL clauses, suggesting boilerplate coding, and help in eliminate the need to remember difficult SQL syntax. The most significant negative themes was table and column name hallucinations, which has been reduced with the release of SC-FIM. The SqlCompose models consistently outperform public and internal LLMs despite their smaller size (7 bn and 13 bn), which provides early indications that smaller specialist models can outperform larger general purpose models.

Chandra Sekhar Maddila

Meta Platforms, Inc.

United States

Negar Ghorbani

Meta Platforms Inc.

United States

Kosay Jabre

Meta Platforms, Inc.

Vijayaraghavan Murali

Meta Platforms Inc.

United States

Edwin Kim

Meta Platforms, Inc.

Parth Thakkar

Meta Platforms, Inc.

Nikolay Pavlovich Laptev

Meta Platforms, Inc.

Olivia Harman

Meta Platforms, Inc.

Diana Hsu

Meta Platforms, Inc.

Rui Abreu

United States

Peter C Rigby

Meta / Concordia University

United States

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 1 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Design for AINew Ideas and Emerging Results (NIER) / SE In Practice (SEIP) / Research Track at 203 Chair(s): Chunyang Chen TU Munich

11:00 15m Talk		A Large-Scale Study of Model Integration in ML-Enabled Software SystemsSE for AI Research Track Yorick Sens Ruhr University Bochum, Henriette Knopp Ruhr University Bochum, Sven Peldszus Ruhr University Bochum, Thorsten Berger Ruhr University Bochum Pre-print
11:15 15m Talk		Are LLMs Correctly Integrated into Software Systems?SE for AI Research Track Yuchen Shao East China Normal University, Yuheng Huang the University of Tokyo, Jiawei Shen East China Normal University, Lei Ma The University of Tokyo & University of Alberta, Ting Su East China Normal University, Chengcheng Wan East China Normal University
11:30 15m Talk		Patch Synthesis for Property Repair of Deep Neural NetworksSE for AI Research Track Zhiming Chi Institute of Software, Chinese Academy of Sciences, Jianan Ma Hangzhou Dianzi University, China; Zhejiang University, Hangzhou, China, Pengfei Yang Institute of Software at Chinese Academy of Sciences, China, Cheng-Chao Huang Nanjing Institute of Software Technology, ISCAS, Renjue Li Institute of Software at Chinese Academy of Sciences, China, Jingyi Wang Zhejiang University, Xiaowei Huang University of Liverpool, Lijun Zhang Institute of Software, Chinese Academy of Sciences
11:45 15m Talk		Optimizing Experiment Configurations for LLM Applications Through Exploratory AnalysisSE for AI New Ideas and Emerging Results (NIER) Nimrod Busany Accenture Labs, Israel, Hananel Hadad Accenture Labs, Israel, Zofia Maszlanka Avanade, Poland, Rohit Shelke University of Ottawa, Canada, Gregory Price University of Ottawa, Canada, Okhaide Akhigbe University of Ottawa, Daniel Amyot University of Ottawa
12:00 15m Talk		AI-Assisted SQL Authoring at Industry ScaleSE for AI SE In Practice (SEIP) Chandra Sekhar Maddila Meta Platforms, Inc., Negar Ghorbani Meta Platforms Inc., Kosay Jabre Meta Platforms, Inc., Vijayaraghavan Murali Meta Platforms Inc., Edwin Kim Meta Platforms, Inc., Parth Thakkar Meta Platforms, Inc., Nikolay Pavlovich Laptev Meta Platforms, Inc., Olivia Harman Meta Platforms, Inc., Diana Hsu Meta Platforms, Inc., Rui Abreu Meta, Peter C Rigby Meta / Concordia University
12:15 15m Talk		Automating ML Model Development at ScaleSE for AI SE In Practice (SEIP) Kaiyuan Wang Google, Yang Li Google Inc, Junyang Shen Google Inc, Kaikai Sheng Google Inc, Yiwei You Google Inc, Jiaqi Zhang Google Inc, Srikar Ayyalasomayajula Google Inc, Julian Grady Google Inc, Martin Wicke Google Inc