Class Archetypes: Principles, Detection, Evolution
Classes are the basic building blocks of object-oriented software systems. To understand key parts of the system we need to understand the roles that classes play by looking at their structure and their behavior. However, source code alone is too opaque to convey such roles, especially in large and complex systems. Moreover, as software evolves, classes originally designed for a single purpose often get new responsibilities, changing, expanding, and complicating their roles over time.
We present a heuristic-based approach for fully automatic class stereotypes identification. We use CodeQL queries to capture structural properties (e.g., attribute composition, inheritance patterns) and to infer behavioral properties (e.g., method interactions, attribute accesses) of classes. We leverage this information to determine intrinsic and system-level class roles. Based on these properties, we define class archetypes that simplify system understanding by using a common pattern language to highlight single and co-occurring roles. We applied our heuristics to a dataset of over 650 Java GitHub repositories. Besides the scalability of our approach, preliminary observations reveal underlying temporal trends in how certain class roles emerge, endure, or evolve, suggesting that automated, property-based class role inference can support future research on software design and system evolution analyses.