Locating Latent Design Information in Developer Discussions: A Study on Pull Requests
A software system’s design determines many of its properties, such as maintainability and performance. To uphold desired properties in a system, developers must be aware of its design. When developers are not aware of a system’s design, choices they make can erode desired system properties . Unfortunately, developers often do not have access to in- formation about a system’s current design. One approach that has been investigated to solve this issue is to recover design automatically from projects artifacts . Most of the existing approaches focus on how a system works by extracting struc- tural (e.g., ) and behaviour (e.g., ) information, rather than information about the desired design properties, such as robustness or performance. Recently, Brunet et al. and Tsay et al. have identified that developer discussions, captured in project artifacts, such as issue reports, include discussions of design , . Tsay et al. have further showed that these discussions can be a major factor in deciding how a system evolves, suggesting that the discussions include information that goes beyond how a system works to explain why certain choices were made. In this paper, we explore whether it is possible to locate automatically where design is discussed in on-line developer discussions. We introduce a classifier that can locate para- graphs in pull request discussions that pertain to design with an average AUC score of 0.87. We show that this classifier, when applied to projects on which it was not trained, agrees with the identification of design points by humans with an average AUC score of 0.79. We finally describe how this classifier could be used as the basis of tools to improve such tasks as reviewing code and implementing new features. This paper shows that there is useful design information latent in on-line developer discussion and provides a means to locate this information at a coarse granularity. Future research can determine how to locate more specific and nuanced design information and investigate how to semantically model the information to produce even more useful tools for developers.