PatchTrack: Analyzing ChatGPT’s Impact on Software Patch Decision-Making in Pull Requests
The utilization of Large Language Models (LLMs) in software development has surged recently. However, understanding how conversational LLMs like ChatGPT can enhance collaborative software development, especially in managing and integrating patches, remains limited. This study addresses this gap by analyzing developers’ shared ChatGPT conversations within merged pull requests. We curated a dataset of 464 ChatGPT-generated code snippets and 1,360 patches from 183 pull requests. We developed PatchTrack to detect whether patches were applied, not applied, or not suggested by ChatGPT, identifying 58 applied patches, 55 not applied, and 70 cases with no suggestions. PatchTrack achieved an overall accuracy of 93.4%, precision of 86.9%, recall of 87.5%, and an F1-score of 87.0%. To gain further insights, we conducted a qualitative analysis to uncover reasons behind the non-integration of ChatGPT suggested patches and scenarios where no patches were proposed. Our findings highlight that ‘adaptation and tailored solutions’ were the primary reasons for not applying patches, while the need for ‘conceptual guidance’ was the most common reason for the absence of patch suggestions. This study offers valuable insights for developers on leveraging ChatGPT to enhance pull request-based software workflows and provides researchers with a foundation for exploring AI-assisted collaborative software tasks.