Edge-Based Detection of Label Flipping Attacks in Federated Learning Using Explainable AI
Federated Learning (FL) is a decentralized machine learning approach that enables collaborative training among distributed clients while preserving data privacy, making it increasingly popular for privacy-sensitive applications over traditional centralized models. However, it introduces new security vulnerabilities that challenge conventional approaches to software vulnerability management. Among these, label flipping attacks (LFAs)—where malicious clients intentionally mislabel data—pose a unique threat to the integrity of FL models. This study presents an AI-driven, edge-based vulnerability detection technique, leveraging explainable AI (XAI) techniques to enhance edge-based security within FL environments. Our method combines Grad-CAM visualizations with DBSCAN clustering to analyze class-specific behavior across clients. By detecting anomalies in Grad-CAM activation patterns, we identify malicious clients with flipped class labels, exploiting patterns in their Grad-CAM heatmaps. This approach is particularly robust against LFAs, examining each class independently and capturing patterns without relying on global model behavior. Empirical results on benchmark datasets such as MNIST and FashionMNIST demonstrate that our method accurately detects LFAs, even when malicious clients constitute a substantial portion of the network. This class-specific, XAI-driven approach contributes to the security of FL by offering an explainable, and scalable solution for managing vulnerabilities in distributed AI systems.