Proof Automation with Large Language Models (ASE 2024 - Research Papers)

Who

Minghai Lu, Benjamin Delaware, Tianyi Zhang

Track

ASE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 30 Oct 2024 14:00 - 14:15 at Carr - Verification Chair(s): Tevfik Bultan

Abstract

Formal verification with interactive theorem provers such as Coq is an effective approach to guarantee software correctness. However, it requires significant human effort to craft theorems and proofs. Large Language Models (LLMs) have shown promise in generating informal proofs in natural language. However, applying LLMs to generate formal proofs remains challenging. In this paper, we conduct a formative study to identify common errors made by LLMs in proof generation. By analyzing 520 errors made by GPT-3.5 when generating Coq proofs, we found GPT-3.5 can effectively outline high-level proof steps but often struggles with low-level details. Based on this insight, we propose PALM, a novel generate-then-repair approach that prompts an LLM to generate an initial proof and leverages targeted symbolic methods to repair the generation errors iteratively. We evaluate PALM on a large dataset with more than 10K theorems. The results show that PALM significantly outperforms existing approaches by successfully proving 76.6% to 180.4% more theorems. Moreover, PALM proves 1270 theorems that none of the other approaches can prove. We also demonstrate the generalizability of PALM across different LLMs.

Link to Preprint

https://arxiv.org/abs/2409.14274

Minghai Lu

Purdue University

United States

Benjamin Delaware

Purdue University

United States

Tianyi Zhang

Purdue University

United States

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 30 Oct
Displayed time zone: Pacific Time (US & Canada) change

13:30 - 15:00	VerificationResearch Papers / Tool Demonstrations at Carr Chair(s): Tevfik Bultan University of California at Santa Barbara

13:30 15m Talk		LLM Meets Bounded Model Checking: Neuro-symbolic Loop Invariant Inference Research Papers Guangyuan Wu Nanjing University, Weining Cao Nanjing University, Yuan Yao Nanjing University, Hengfeng Wei State Key Laboratory for Novel Software Technology, Nanjing University, Taolue Chen Birkbeck, University of London, Xiaoxing Ma State Key Laboratory for Novel Software Technology, Nanjing University
13:45 15m Talk		LLM-Generated Invariants for Bounded Model Checking Without Loop Unrolling Research Papers Muhammad A. A. Pirzada The University of Manchester, Giles Reger University of Manchester, Ahmed Bhayat Independent Scholar, Lucas C. Cordeiro University of Manchester, UK and Federal University of Amazonas, Brazil Link to publication DOI
14:00 15m Talk		Proof Automation with Large Language Models Research Papers Minghai Lu Purdue University, Benjamin Delaware Purdue University, Tianyi Zhang Purdue University Pre-print
14:15 15m Talk		Verifying the Option Type With Rely-Guarantee Reasoning Research Papers James Yoo University of Washington, Michael D. Ernst University of Washington, René Just University of Washington Link to publication DOI
14:30 10m Talk		CoVeriTeam GUI: A No-Code Approach to Cooperative Software Verification Tool Demonstrations Thomas Lemberger LMU Munich, Henrik Wachowitz LMU Munich
14:40 10m Talk		CoqPilot, a plugin for LLM-based generation of proofs Tool Demonstrations Andrei Kozyrev JetBrains Research, Constructor University Bremen, Gleb Solovev JetBrains Research, Constructor University Bremen, Nikita Khramov JetBrains Research, Constructor University Bremen, Anton Podkopaev JetBrains Research, Constructor University