To demonstrate the functional-safety of cyber-physical systems engineers must identify hazardous situations and determine countermeasures. State-of-the-art techniques demonstrated that clustering algorithms can be successfully applied to feature vectors extracted through transfer learning to identify hazardous situations, due to erroneous predictions made by Deep Neural Networks (DNNs), also enabling the improvement of DNNs through retraining. However the choice of the components in a clustering analysis pipeline may largely affect the results obtained in different execution conditions. For this reason, we empirically evaluated 99 pipelines comprising feature extraction, dimensionality reduction, and clustering techniques on DNNs failing due to a multiple root causes. Our findings reveal that a pipeline combining transfer learning, DBSCAN, and UMAP outperforms others by producing clusters with higher purity and more comprehensive failure scenario coverage.