ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil

Agentic AI has recently received significant attention and, despite criticism, the technology is likely to remain around as one-among-many tools. While there are many agents and frameworks available to choose from, some conflicts and issues arise regardless of the technology stack used. This work seeks to highlight some of the more prominent issues, cures and overall best practices. The recommendations can be summarized as follows: (1) Move as much functionality into unit-testable, non-AI tools. Next to better quality guarantees, this also frees up valuable context- and prompt-space for the actual LLM components. (2) Implement logging and gover- nance from the first steps. Easy access to artifacts helps to create regression tests, and monitoring LLM use helps to find anomalies and estimate experiment budgets. (3) Define the scope of experiments early, and rotate subsets of datapoints during development. The flexibility of LLMs will lead to poor behavior-patterns, and rota- tion helps to avoid blind-spots and over-fitting during development. While these are generally best practices when developing software, the use of LLMs introduces a more complex black-box component than what we usually see in (software engineering) research, and with it comes a need for advanced quality assurance.