Characterizing the System Evolution That is Proposed After a Software Incident
This program is tentative and subject to change.
When a system failure is sufficiently severe, system owners may conduct a post-mortem analysis to learn from the failure and then propose ways to evolve the system in an effort to prevent similar failures in the future. Our interest in this paper is broadly in characterizing the system evolution that follows from post-mortem analysis. Specifically, we have three research questions: (1) what aspects of an incident motivate proposed changes, (2) what parts of the system are targeted by the proposed changes, and (3) what are the intended effects of the proposed changes on system characteristics. To answer these questions, we have conducted an empirical study of 360 proposed changes from 75 public incident reports. From our analysis we have found that proposed changes are motivated by a wide variety of events experienced during the incident, including how the incident was triggered, ways the failure propagated, response related events, and the recovery of the system. We have also found that a wide variety of system parts are targeted by AIs, including system aspects that may not be considered for evolution in other contexts. Finally, we have found that AIs primarily propose to evolve performance efficiency, reliability and to a lesser degree flexibility and safety, and propose to do so by considering narrow scenarios related to the incident.