Investigating a NASA Cyclomatic Complexity Policy on Maintenance of a Critical System (ESEIW 2022 - ESEM Industry Forum)

Who

Daniel Port, Bill Taber

Track

ESEIW 2022 ESEM Industry Forum

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 22 Sep 2022 14:25 - 14:40 at Sonck - Session 2B - Technical Debt & Effort Estimation Chair(s): Carolyn Seaman

Abstract

Background

Monte is a critical system used by NASA for navigation and design of deep space missions (informally referred to as the “Google Maps to the stars” ). The system in its current state is a complex integration of hundreds of C++ and Python applications, with several versions used in over 40 missions over the last 18 years. Its continuous, reliable operation is considered critical to the operation for over 18 ongoing missions. This system has been under active, disciplined maintenance for over 20 years. Defect reports and new feature requests continuously arrive at the rate of 160 unique defect reports per year and 150 feature requests per year. The codebase is currently 800,000 SLOC and growing at a rate of 2750 per release and an average of 11.75 releases per year.

Recently a “complexity census” was performed on the maintenance of 2477 Monte system files consisting of a total of 413,275 SLOC. For each file, the following data were collected: SLOC, source lines of comments, cyclomatic complexity (CC), defects associated with the file, defects removed associated with the file, and the person-hours expended to remove them.

This study uses this data to empirically investigate the impacts of CC on defect proneness and maintenance efforts. Does higher CC associate with higher defect proneness? More effort to repair defects? If so, is there a CC after which action should be taken to reduce defect proneness and repair effort?

Aims

CC is widely cited as a good predictor of maintainability. Perhaps as a result of this, NASA instituted a policy “If CC is greater than 15, take action to reduce the complexity”

Regardless of what actions are taken, these practices add cost to the maintenance of the system. The concern is that these practices have not been empirically substantiated. Previous studies have questioned the effect of CC on increasing defect proneness and increasing defect repair effort. How much confidence can we have that reducing CC, does in practice, reduce defect proneness and repair effort? If CC does indeed reduce these, is 15 the magic number for which CC should be reduced?

The aim of this study is to provide empirical evidence to support (or refute) these two best practices. While we are also interested in economics (i.e. does the benefit of the practices outweigh the costs), this is not a focus of this particular study.

Method

We are keenly aware of the strong relationships between CC, SLOD, and SLOC which greatly complicate parsing out what each factor contributes to associated defects and defect repair effort. Indeed, several studies have argued that CC provides no more useful information than SLOC alone provides. In the face of this and strong correlations with SLOC, we use Negative Binomial regression to determine the degree CC contributes to defects reported and defect removal effort. This analysis is further complicated by non-trivial interactions, non-continuous responses, and high dispersion of the data. Cluster analysis is performed to determine if there is a CC threshold whereby their impacts are “notable”. The analysis takes a mainly empirical approach, making as few assumptions as possible and not relying on theoretical arguments. We make no attempt to validate that CC is the cause of significant impacts on associated defects and defect repair efforts.

Results

We find that CC, SLOD and SLOC are significant predictors of associated defects. SLOD and SLOC are significant predictors of defect proneness and defect removal effort. As has been cited in other works, the predictiveness of CC is no better than SLOC. We find a notable increase in associated defects and defect removal effort when CC is greater than 15.

Conclusions

The impacts of CC on defect proneness are with high confidence consistent with the expectations of the NASA policy. While the predictiveness of CC for defect proneness and defect removal effort is no better than SLOC alone, the information gleaned from them is useful for empirically validating the NASA policy.

Daniel Port

University of Hawai‘i at Mānoa

Bill Taber

Jet Propulsion Laboratory