How Slim Will My System Be? Estimating Refactored Code Size by Merging Clones
We have been doing code clone analysis with industry collaborators for a long time, and have been always asked a question, "OK, I understand my system contains a lot of code clones, but how slim will it be after merging redundant code clones?'' As a software system evolves for long period, it would increasingly contain many code clones due to quick bug fix and new feature addition. Industry collaborators would recognize decay of initial design simplicity, and try to evaluate current system from the view point of maintenance effort and cost. As one of resources for the evaluation, the estimated code size by merging code clone is very important for them. In this paper, we formulate this issue as ``slimming'' problem, and present three different slimming methods, Basic, Complete, and Heuristic Methods, each of which gives a lower bound, upper bound, and modest reduction rates, respectively. Application of these methods to OSS systems written in C/C++ showed that the reduction rate is at most 5.7% of the total size, and to a commercial COBOL system, it is at most 15.4%. For this approach, we have gotten initial but very positive feedback from industry collaborators.