TCSE logo 
 Sigsoft logo
Sustainability badge

This program is tentative and subject to change.

Thu 1 May 2025 12:00 - 12:15 at 203 - Design for AI

SqlCompose brings generative AI into the data analytics domain. SQL is declarative, has formal table schemas, and is often written in a non-linear manner. We address each of these challenges and develop a set of models that shows the importance of each problem. We first develop an internal SQL benchmark to perform offline tests at Meta. We evaluate how well the Public Llama model performs. We attain a BLEU score of 53% and 24% for single- and multi-line predictions, respectively. This performance is consistent with prior works on imperative languages. We then fine-tune Llama on our internal data and database schemas. SC-Schema substantially outperforms Llama by 16 percentage points on BLEU score. SQL is often written with multiple sub queries and in a non-sequential manner. We develop SC-FIM which is aware of the context before and after the line(s) that need to be completed. This fill-in-the-middle model outperform SC-FIM by 35 percentage points. We also measure how often the models get the correct table names, and SC-FIM is able to do this 75% of the time a major improvement over the other two models. Aside from our scientific research, we also roll out SC-FIM at Meta. SqlCompose is used on a weekly basis by over 10k users including data scientists and software engineers, less than 1% of users have disabled SqlCompose. We use the feedback from users to improve SqlCompose. Interesting positive themes include completing tedious or repetitive SQL clauses, suggesting boilerplate coding, and help in eliminate the need to remember difficult SQL syntax. The most significant negative themes was table and column name hallucinations, which has been reduced with the release of SC-FIM. The SqlCompose models consistently outperform public and internal LLMs despite their smaller size (7 bn and 13 bn), which provides early indications that smaller specialist models can outperform larger general purpose models.

This program is tentative and subject to change.

Thu 1 May

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
11:00
15m
Talk
A Large-Scale Study of Model Integration in ML-Enabled Software Systems
Research Track
Yorick Sens Ruhr University Bochum, Henriette Knopp Ruhr University Bochum, Sven Peldszus Ruhr University Bochum, Thorsten Berger Ruhr University Bochum
11:15
15m
Talk
Are LLMs Correctly Integrated into Software Systems?
Research Track
Yuchen Shao East China Normal University, Yuheng Huang the University of Tokyo, Jiawei Shen East China Normal University, Lei Ma The University of Tokyo & University of Alberta, Ting Su East China Normal University, Chengcheng Wan East China Normal University
11:30
15m
Talk
Patch Synthesis for Property Repair of Deep Neural Networks
Research Track
Zhiming Chi Institute of Software, Chinese Academy of Sciences, Jianan Ma Hangzhou Dianzi University, China; Zhejiang University, Hangzhou, China, Pengfei Yang Institute of Software at Chinese Academy of Sciences, China, Cheng-Chao Huang Nanjing Institute of Software Technology, ISCAS, Renjue Li Institute of Software at Chinese Academy of Sciences, China, Jingyi Wang Zhejiang University, Xiaowei Huang University of Liverpool, Lijun Zhang Institute of Software, Chinese Academy of Sciences
11:45
15m
Talk
Optimizing Experiment Configurations for LLM Applications Through Exploratory Analysis
New Ideas and Emerging Results (NIER)
Nimrod Busany Accenture Labs, Israel, Hananel Hadad Accenture Labs, Israel, Zofia Maszlanka Avanade, Poland, Rohit Shelke University of Ottawa, Canada, Gregory Price University of Ottawa, Canada, Okhaide Akhigbe University of Ottawa, Daniel Amyot University of Ottawa
12:00
15m
Talk
AI-Assisted SQL Authoring at Industry Scale
SE In Practice (SEIP)
Chandra Sekhar Maddila Meta Platforms, Inc., Negar Ghorbani Meta Platforms Inc., Kosay Jabre Meta Platforms, Inc., Vijayaraghavan Murali Meta Platforms Inc., Edwin Kim Meta Platforms, Inc., Parth Thakkar Meta Platforms, Inc., Nikolay Pavlovich Laptev Meta Platforms, Inc., Olivia Harman Meta Platforms, Inc., Diana Hsu Meta Platforms, Inc., Rui Abreu Meta, Peter C Rigby Meta / Concordia University
12:15
15m
Talk
Automating ML Model Development at Scale
SE In Practice (SEIP)
Kaiyuan Wang Google, Yang Li Google Inc, Junyang Shen Google Inc, Kaikai Sheng Google Inc, Yiwei You Google Inc, Jiaqi Zhang Google Inc, Srikar Ayyalasomayajula Google Inc, Julian Grady Google Inc, Martin Wicke Google Inc
:
:
:
: