Automatic Refactoring Candidate Identification Leveraging Effective Code Representation
The use of machine learning to automate the detection of refactoring candidates is a rapidly evolving research area. The majority of work in this direction uses source code metrics and commit messages to predict refactoring candidates and do not exploit the rich semantics of source code. This paper proposes a new approach for extract method refactoring candidates identification. First, we propose a novel mechanism to identify negative samples for the refactoring candidate identification task. We then employ a self-supervised autoencoder to acquire a compact representation of source code generated by a pre-trained large language model. Subsequently, we train a binary classifier to predict extract method refactoring candidates. Experiments show that our new approach outperforms the state of the art by 30% in terms of F1 score. The proposed work has implications for researchers and practitioners. Software developers may use the proposed automated approach to predict refactoring candidates better. This study will facilitate the development of improved refactoring candidate identification methods that the researchers in the field could use and extend.