APSEC 2022
Tue 6 - Fri 9 December 2022
Thu 8 Dec 2022 15:20 - 15:40 at Room2 - Empirical Studies 2 Chair(s): Yusuf Sulistyo Nugroho

Most of the text in a computer program is composed of the names of variables and functions. These names are selected by one developer, and need to be understood by others. This is similar to the role of words written in natural language. But there are several marked differences between the names in a program and the words in a book. First, names are frequently composed of multiple existing words, in an attempt to capture nuanced meanings and intents. Second, because of the use of multiple words, names can be rather long. Third, conventions may also allow names to be very short, and many single-letter names are used. But despite these differences, the general statistics of names are rather similar to the statistics of words. Like words, the distribution of names is close to a Zipf distribution. Also, popular names tend to be shorter than rarely used names. However, the underlying vocabulary if different. The composition of words leads to a more diverse vocabulary that can grow without bounds. But if we look at the individual words used in compound names, we find a rather limited vocabulary. These properties help explain the predictability of software, and how it can coincide with the large variability of names. It also suggests that it may be beneficial to model programs at the level of individual words rather than at the level of source code tokens.

Thu 8 Dec

Displayed time zone: Osaka, Sapporo, Tokyo change

15:00 - 16:30
Empirical Studies 2Technical Track at Room2
Chair(s): Yusuf Sulistyo Nugroho Universitas Muhammadiyah Surakarta
15:00
20m
Paper
Exploring Activity and Contributors on GitHub: Who, What, When, and Where
Technical Track
Xiaoya Xia East China Normal University, Zhenjie Weng East China Normal University, will wang , Shengyu Zhao Tongji University
15:20
20m
Paper
The Language of Programming: On the Vocabulary of Names
Technical Track
Nitsan Amit Hebrew University, Dror Feitelson Hebrew University
15:40
20m
Paper
An Empirical Study of Predicting Fault-prone Components and their Evolution
Technical Track
Aparna Pisolkar Gannon University, Md Tajmilur Rahman Gannon University
16:00
20m
Paper
Empirical Study of Co-Renamed Identifiers
Technical Track
Yuki Osumi Tokyo Institute of Technology, Naotaka Umekawa Tokyo Institute of Technology, Hitomi Komata Tokyo Institute of Technology, Shinpei Hayashi Tokyo Institute of Technology
DOI Pre-print