Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification (AgenticSE 2025)

Sun 16 - Thu 20 November 2025 Seoul, South Korea

Who

Keshav Ramani, Vali Tawosi, Salwa Alamir, Daniel Borrajo

Track

AgenticSE 2025

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 20 Nov 2025 16:15 - 16:30 at Grand Hall 4 - Session 4 & Closing

Abstract

We introduce a novel framework for evaluating the alignment between natural language plans and their expected behavior by converting them into Kripke structures and Linear Temporal Logic (LTL) using Large Language Models (LLMs) and performing model checking. We systematically evaluate this framework on a simplified version of the PlanBench plan verification dataset and report on metrics like Accuracy, Precision, Recall and F1 scores. Our experiments demonstrate that GPT-5 achieves excellent classification performance (F1 score of 96.3%) while almost always producing syntactically perfect formal representations that can act as guarantees. However, the synthesis of semantically perfect formal models remains an area for future exploration.

Keshav Ramani

J.P. Morgan AI Research

United States

Vali Tawosi

J.P. Morgan AI Research

United Kingdom

Salwa Alamir

J.P. Morgan AI Research

United Kingdom

Daniel Borrajo

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 20 Nov
Displayed time zone: Seoul change

16:00 - 18:00	Session 4 & ClosingAgenticSE at Grand Hall 4

16:00 15m Short-paper		AgentGuard: Runtime Verification of AI Agents AgenticSE Roham Koohestani Delft University of Technology
16:15 15m Short-paper		Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification AgenticSE Keshav Ramani J.P. Morgan AI Research, Vali Tawosi J.P. Morgan AI Research, Salwa Alamir J.P. Morgan AI Research, Daniel Borrajo
16:30 25m Talk		The Last Dependency Crusade: Solving Python Dependency Conflicts with LLMs AgenticSE Antony Bartlett TU Delft, The Netherlands, Cynthia C. S. Liem Delft University of Technology, Annibale Panichella Delft University of Technology
17:00 15m Day closing		Wrap-up, acknowledgments, and discussion AgenticSE Maliheh Izadi Delft University of Technology, Michael Pradel CISPA Helmholtz Center for Information Security, Satish Chandra Google, Inc