Most of the text in a computer program is composed of the names of variables and functions. These names are selected by one developer, and need to be understood by others. This is similar to the role of words written in natural language. But there are several marked differences between the names in a program and the words in a book. First, names are frequently composed of multiple existing words, in an attempt to capture nuanced meanings and intents. Second, because of the use of multiple words, names can be rather long. Third, conventions may also allow names to be very short, and many single-letter names are used. But despite these differences, the general statistics of names are rather similar to the statistics of words. Like words, the distribution of names is close to a Zipf distribution. Also, popular names tend to be shorter than rarely used names. However, the underlying vocabulary if different. The composition of words leads to a more diverse vocabulary that can grow without bounds. But if we look at the individual words used in compound names, we find a rather limited vocabulary. These properties help explain the predictability of software, and how it can coincide with the large variability of names. It also suggests that it may be beneficial to model programs at the level of individual words rather than at the level of source code tokens.
Thu 8 DecDisplayed time zone: Osaka, Sapporo, Tokyo change
15:00 - 16:30 | Empirical Studies 2Technical Track at Room2 Chair(s): Yusuf Sulistyo Nugroho Universitas Muhammadiyah Surakarta | ||
15:00 20mPaper | Exploring Activity and Contributors on GitHub: Who, What, When, and Where Technical Track Xiaoya Xia East China Normal University, Zhenjie Weng East China Normal University, will wang , Shengyu Zhao Tongji University | ||
15:20 20mPaper | The Language of Programming: On the Vocabulary of Names Technical Track | ||
15:40 20mPaper | An Empirical Study of Predicting Fault-prone Components and their Evolution Technical Track | ||
16:00 20mPaper | Empirical Study of Co-Renamed Identifiers Technical Track Yuki Osumi Tokyo Institute of Technology, Naotaka Umekawa Tokyo Institute of Technology, Hitomi Komata Tokyo Institute of Technology, Shinpei Hayashi Tokyo Institute of Technology DOI Pre-print |