Semantically Aligned Question and Code Generation for Automated Insight Generation (LLM4Code 2024)

Who

Ananya Singha, Bhavya Chopra, Anirudh Khatry, Sumit Gulwani, Austin Henley, Vu Le, Chris Parnin, Mukul Singh, Gust Verbruggen

Track

LLM4Code 2024

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 20 Apr 2024 17:00 - 17:10 at Luis de Freitas Branco - Session 4: Full Papers + Award & Closing Chair(s): Prem Devanbu

Abstract

Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or \emph{align}) to the insight. In this paper, we leverage the semantic knowledge of large language models to generate targeted and insightful questions about data and the corresponding code to answer those questions. Then through an empirical study on data from Open-WikiTable, we show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code. Additionally, we found that generating questions and code together yields more diverse questions.

Link to Preprint

https://llm4code.github.io/assets/pdf/papers/21.pdf

Ananya Singha

Microsoft

Bhavya Chopra

Microsoft

India

Anirudh Khatry

Microsoft

Sumit Gulwani

Microsoft

United States

Austin Henley

University of Tennessee

United States

Vu Le

Microsoft

United States

Chris Parnin

Microsoft

Mukul Singh

Microsoft

Gust Verbruggen

Microsoft

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sat 20 Apr
Displayed time zone: Lisbon change

16:00 - 17:30	Session 4: Full Papers + Award & ClosingLLM4Code at Luis de Freitas Branco Chair(s): Prem Devanbu University of California at Davis

16:00 10m Talk		Investigating the Proficiency of Large Language Models in Formative Feedback Generation for Student Programmers LLM4Code Smitha S Kumar Heriot-Watt University -UAE, Michael Lones Heriot Watt University- UK, Manuel Maarek Heriot-Watt University, Hind Zantout Heriot-Watt University -UAE Pre-print
16:10 10m Talk		Tackling Students' Coding Assignments with LLMs LLM4Code Adam Dingle Charles University, Martin Kruliš Charles University Pre-print
16:20 10m Talk		Applying Large Language Models to Enhance the Assessment of Parallel Functional Programming AssignmentsBest Presentation Award LLM4Code Skyler Grandel Vanderbilt University, Douglas C. Schmidt Vanderbilt University, Kevin Leach Vanderbilt University Pre-print
16:30 10m Talk		An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project LLM4Code Sanka Rasnayaka National University of Singapore, Wang Guanlin National University of Singapore, Ridwan Salihin Shariffdeen National University of Singapore, Ganesh Neelakanta Iyer National University of Singapore Pre-print
16:40 10m Talk		LLMs for Relational Reasoning: How Far are We? LLM4Code Zhiming Li Nanyang Technological University, Singapore, Yushi Cao Nanyang Technological University, Xiufeng Xu Nanyang Technological University, Junzhe Jiang Hong Kong Polytechnic University, Xu Liu North Carolina State University, Yon Shin Teo Continental Automotive Singapore Pte. Ltd., Shang-Wei LIN Nanyang Technological University, Yang Liu Nanyang Technological University Pre-print
16:50 10m Talk		HawkEyes: Spotting and Evading Instruction Disalignments of LLMs LLM4Code Dezhi Ran Peking University, Zihe Song University of Texas at Dallas, Wenhan Zhang Peking University, Wei Yang University of Texas at Dallas, Tao Xie Peking University
17:00 10m Talk		Semantically Aligned Question and Code Generation for Automated Insight GenerationBest Paper Award LLM4Code Ananya Singha Microsoft, Bhavya Chopra Microsoft, Anirudh Khatry Microsoft, Sumit Gulwani Microsoft, Austin Henley University of Tennessee, Vu Le Microsoft, Chris Parnin Microsoft, Mukul Singh Microsoft, Gust Verbruggen Microsoft Pre-print
17:10 20m Day closing		Award & Closing LLM4Code