Trust Calibration in IDEs: Paving the Way for Widespread Adoption of AI Refactoring
This program is tentative and subject to change.
In the software industry, the drive to add new features often overshadows the need to improve existing code. Large Language Models (LLMs) offer a new approach to improving codebases at an unprecedented scale through AI-assisted refactoring. However, LLMs come with inherent risks such as braking changes and the introduction of security vulnerabilities. We advocate for encapsulating the interaction with the models in IDEs and validating refactoring attempts using trustworthy safeguards. However, equally important for the uptake of AI refactoring is research on trust development. In this position paper, we position our future work based on established models from research on human factors in automation. We outline action research within CodeScene on development of 1) novel LLM safeguards and 2) user interaction that conveys an appropriate level of trust. The industry collaboration enables large-scale repository analysis and A/B testing to continuously guide the design of our research interventions.
Dr. Markus Borg is a senior researcher at the intersection of software engineering and applied artificial intelligence. He is a principal researcher at CodeScene and an adjunct lecturer at Lund University. Markus serves on the editorial board of Empirical Software Engineering and is a department editor for IEEE Software.
My goal is to support the successful engineering of software and data-intensive systems. While my software engineering interests are broad, most of my work relates to machine learning, i.e., my research interests span both software engineering intelligence (AI4SE) and AI engineering (SE4AI).
In AI4SE, I seek to tap into the collected wisdom of historical project data to facilitate machine learning for actionable decision support. My most impactful contributions have been related to defect management, for example, bug assignment and change impact analysis. Core ideas are currently operationalized in internal tools at Ericsson.
In SE4AI, I investigate quality assurance of systems that embed machine learning components. I am particularly interested in development mandated by automotive safety standards and the EU AI Act. Our research studies involve requirements engineering, MLOps pipelines, software testing in automotive simulators, and our open-source demonstrator SMIRK.