A tangled commit or patch contains multiple independent changes, such as a bug fix and a refactoring. Tangled commits are harder to review, increasing the chance to miss problems during code review. For AI assistants such as code synthesis and bug-fixing models, tangled commits create noise in model’s training dataset, deteriorating performance.
Recent work proposed untangling approaches to solve this issue. Unfortunately, existing untangling approaches use synthetic commits, which may not be representative of real commits, or do incremental evaluations, making it difficult for researchers to compare unrelated approaches.
Compared to previous work, our methodology evaluates unrelated approaches on a set of real commits that have been manually untangled.
We find that untangling approaches may have a lower performance on real commits than on synthetic commits and that the resulting untangled commits may be hard to leverage in practice for developers due to the granularity of how the changes are represented in the untangling approaches. Additionally, we find that the size of the change has a statistically significant and large effect on the performance of the untangling approaches.
Program Display Configuration
Tue 18 Jul
Displayed time zone: Pacific Time (US & Canada)change