Trust Calibration in IDEs: Paving the Way for Widespread Adoption of AI Refactoring (IDE 2025)

Track

IDE 2025 Integrated Development Environments

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 3 May 2025 14:15 - 14:30 at 205 - Session 3: Refactoring & AI, & Session 4: Plugins and applications

Abstract

In the software industry, the drive to add new features often overshadows the need to improve existing code. Large Language Models (LLMs) offer a new approach to improving codebases at an unprecedented scale through AI-assisted refactoring. However, LLMs come with inherent risks such as braking changes and the introduction of security vulnerabilities. We advocate for encapsulating the interaction with the models in IDEs and validating refactoring attempts using trustworthy safeguards. However, equally important for the uptake of AI refactoring is research on trust development. In this position paper, we position our future work based on established models from research on human factors in automation. We outline action research within CodeScene on development of 1) novel LLM safeguards and 2) user interaction that conveys an appropriate level of trust. The industry collaboration enables large-scale repository analysis and A/B testing to continuously guide the design of our research interventions.

Bio

Dr. Markus Borg is a senior researcher at the intersection of software engineering and applied artificial intelligence. He is a principal researcher at CodeScene and an adjunct lecturer at Lund University. Markus serves on the editorial board of Empirical Software Engineering and is a department editor for IEEE Software.

My goal is to support the successful engineering of software and data-intensive systems. While my software engineering interests are broad, most of my work relates to machine learning, i.e., my research interests span both software engineering intelligence (AI4SE) and AI engineering (SE4AI).

In AI4SE, I seek to tap into the collected wisdom of historical project data to facilitate machine learning for actionable decision support. My most impactful contributions have been related to defect management, for example, bug assignment and change impact analysis. Core ideas are currently operationalized in internal tools at Ericsson.

In SE4AI, I investigate quality assurance of systems that embed machine learning components. I am particularly interested in development mandated by automotive safety standards and the EU AI Act. Our research studies involve requirements engineering, MLOps pipelines, software testing in automotive simulators, and our open-source demonstrator SMIRK.