Formal verification with interactive theorem provers such as Coq is an effective approach to guarantee software correctness. However, it requires significant human effort to craft theorems and proofs. Large Language Models (LLMs) have shown promise in generating informal proofs in natural language. However, applying LLMs to generate formal proofs remains challenging. In this paper, we conduct a formative study to identify common errors made by LLMs in proof generation. By analyzing 520 errors made by GPT-3.5 when generating Coq proofs, we found GPT-3.5 can effectively outline high-level proof steps but often struggles with low-level details. Based on this insight, we propose PALM, a novel generate-then-repair approach that prompts an LLM to generate an initial proof and leverages targeted symbolic methods to repair the generation errors iteratively. We evaluate PALM on a large dataset with more than 10K theorems. The results show that PALM significantly outperforms existing approaches by successfully proving 76.6% to 180.4% more theorems. Moreover, PALM proves 1270 theorems that none of the other approaches can prove. We also demonstrate the generalizability of PALM across different LLMs.
Wed 30 OctDisplayed time zone: Pacific Time (US & Canada) change
13:30 - 15:00 | VerificationResearch Papers / Tool Demonstrations at Carr Chair(s): Tevfik Bultan University of California at Santa Barbara | ||
13:30 15mTalk | LLM Meets Bounded Model Checking: Neuro-symbolic Loop Invariant Inference Research Papers Guangyuan Wu Nanjing University, Weining Cao Nanjing University, Yuan Yao Nanjing University, Hengfeng Wei State Key Laboratory for Novel Software Technology, Nanjing University, Taolue Chen Birkbeck, University of London, Xiaoxing Ma State Key Laboratory for Novel Software Technology, Nanjing University | ||
13:45 15mTalk | LLM-Generated Invariants for Bounded Model Checking Without Loop Unrolling Research Papers Muhammad A. A. Pirzada The University of Manchester, Giles Reger University of Manchester, Ahmed Bhayat Independent Scholar, Lucas C. Cordeiro University of Manchester, UK and Federal University of Amazonas, Brazil Link to publication DOI | ||
14:00 15mTalk | Proof Automation with Large Language Models Research Papers Pre-print | ||
14:15 15mTalk | Verifying the Option Type With Rely-Guarantee Reasoning Research Papers James Yoo University of Washington, Michael D. Ernst University of Washington, René Just University of Washington Link to publication DOI | ||
14:30 10mTalk | CoVeriTeam GUI: A No-Code Approach to Cooperative Software Verification Tool Demonstrations | ||
14:40 10mTalk | CoqPilot, a plugin for LLM-based generation of proofs Tool Demonstrations Andrei Kozyrev JetBrains Research, Constructor University Bremen, Gleb Solovev JetBrains Research, Constructor University Bremen, Nikita Khramov JetBrains Research, Constructor University Bremen, Anton Podkopaev JetBrains Research, Constructor University |