Towards Robust ML-enabled Software Systems: Detecting Out-of-Distribution data using Gini Coefficients
Machine learning (ML) models have become essential components in software systems across several domains, such as autonomous driving, healthcare, and finance. The robustness of these ML models is crucial for maintaining the software systems performance and reliability. A significant challenge arises when these systems encounter out-of-distribution (OOD) data, examples that differ from the training data distribution. OOD data can cause a degradation of the software systems performance. Therefore, an effective OOD detection mechanism is essential for maintaining software system performance and robustness. Such a mechanism should identify and reject OOD inputs and alert software engineers. Current OOD detection methods rely on hyperparameters tuned with in-distribution and OOD data. However, defining the OOD data that the system will encounter in production is often infeasible. Further, the performance of these methods degrades with OOD data that has similar characteristics to the in-distribution data. In this paper, we propose a novel OOD detection method using the Gini coefficient. Our method does not require prior knowledge of OOD data or hyperparameter tuning. On common benchmark datasets, we show that our method outperforms the existing maximum softmax probability (MSP) baseline. For a model trained on the MNIST dataset, we improve the OOD detection rate by 4% on the CIFAR10 dataset and by more than 50% for the EMNIST dataset.