ASE 2025 (series) / AgenticSE 2025 (series) / AgenticSE 2025 /
Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification
Thu 20 Nov 2025 16:15 - 16:30 at Grand Hall 4 - Session 4 & Closing
We introduce a novel framework for evaluating the alignment between natural language plans and their expected behavior by converting them into Kripke structures and Linear Temporal Logic (LTL) using Large Language Models (LLMs) and performing model checking. We systematically evaluate this framework on a simplified version of the PlanBench plan verification dataset and report on metrics like Accuracy, Precision, Recall and F1 scores. Our experiments demonstrate that GPT-5 achieves excellent classification performance (F1 score of 96.3%) while almost always producing syntactically perfect formal representations that can act as guarantees. However, the synthesis of semantically perfect formal models remains an area for future exploration.
Thu 20 NovDisplayed time zone: Seoul change
Thu 20 Nov
Displayed time zone: Seoul change
16:00 - 18:00 | |||
16:00 15mShort-paper | AgentGuard: Runtime Verification of AI Agents AgenticSE Roham Koohestani Delft University of Technology | ||
16:15 15mShort-paper | Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification AgenticSE Keshav Ramani J.P. Morgan AI Research, Vali Tawosi J.P. Morgan AI Research, Salwa Alamir J.P. Morgan AI Research, Daniel Borrajo | ||
16:30 25mTalk | The Last Dependency Crusade: Solving Python Dependency Conflicts with LLMs AgenticSE Antony Bartlett TU Delft, The Netherlands, Cynthia C. S. Liem Delft University of Technology, Annibale Panichella Delft University of Technology | ||
17:00 15mDay closing | Wrap-up, acknowledgments, and discussion AgenticSE Maliheh Izadi Delft University of Technology, Michael Pradel CISPA Helmholtz Center for Information Security, Satish Chandra Google, Inc | ||