ASE 2025
Sun 16 - Thu 20 November 2025 Seoul, South Korea
Thu 20 Nov 2025 16:15 - 16:30 at Grand Hall 4 - Session 4 & Closing

We introduce a novel framework for evaluating the alignment between natural language plans and their expected behavior by converting them into Kripke structures and Linear Temporal Logic (LTL) using Large Language Models (LLMs) and performing model checking. We systematically evaluate this framework on a simplified version of the PlanBench plan verification dataset and report on metrics like Accuracy, Precision, Recall and F1 scores. Our experiments demonstrate that GPT-5 achieves excellent classification performance (F1 score of 96.3%) while almost always producing syntactically perfect formal representations that can act as guarantees. However, the synthesis of semantically perfect formal models remains an area for future exploration.

Thu 20 Nov

Displayed time zone: Seoul change

16:00 - 18:00
Session 4 & ClosingAgenticSE at Grand Hall 4
16:00
15m
Short-paper
AgentGuard: Runtime Verification of AI Agents
AgenticSE
Roham Koohestani Delft University of Technology
16:15
15m
Short-paper
Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification
AgenticSE
Keshav Ramani J.P. Morgan AI Research, Vali Tawosi J.P. Morgan AI Research, Salwa Alamir J.P. Morgan AI Research, Daniel Borrajo
16:30
25m
Talk
The Last Dependency Crusade: Solving Python Dependency Conflicts with LLMs
AgenticSE
Antony Bartlett TU Delft, The Netherlands, Cynthia C. S. Liem Delft University of Technology, Annibale Panichella Delft University of Technology
17:00
15m
Day closing
Wrap-up, acknowledgments, and discussion
AgenticSE
Maliheh Izadi Delft University of Technology, Michael Pradel CISPA Helmholtz Center for Information Security, Satish Chandra Google, Inc