A Quantitative Investigation of Trends in Confusing Variable Pairs Through Commits: Do Confusing Variable Pairs Survive?
Variables’ names significantly impact the readability of source code in which they are used. Programmers can make the variables easy to understand by choosing meaningful names. However, even though individual variables have meaningful names, a collection of them might adversely affect the code readability when a variable’s name is highly similar to another’s, such as "bottomRight'' vs. "bottomHeight.'' Such a pair of variables with highly similar names is referred to as a "confusing variable pair.'' Programmers may avoid confusing variable pairs because they tend to cause mixing up or misreading of variables during programming and code review activities. In order to examine practical trends of how confusing variable pairs have appeared or disappeared over commits, this paper conducts a large-scale investigation of 100 Java projects and 100 Python ones randomly selected from GitHub. The study reports the following findings. (1) The average number of confusing variable pairs appearing in a Java source file is 1.4, and that in a Python source file is 1.3. (2) Once a confusing variable pair is born, about 67–75% survive, but about 25–33% disappear by code modifications through code commits. (3) Confusing variable pairs tend to appear in at most 20–30% of source files within a project, and code maintenance making those pairs disappear would be performed at 6–14% of source files. (4) Although the appearance/disappearance trends do not seem to vary among projects, there are some outlier projects in which significantly more confusing variable pairs appear.