How Faithful Are Post-hoc Explanations for Transformer-Based Software Models?
This program is tentative and subject to change.
Transformer-based models such as CodeBERT, GraphCodeBERT, and CodeT5 are widely used in software engineering tasks, yet their decision processes remain difficult to interpret. Existing post-hoc explainers often provide unstable or inconsistent attributions on code, motivating the need for methods that better reflect model behavior. This paper introduces CoScoreX, a contextual–contrastive explanation framework that combines semantic neighborhood modeling with opposite-class information to generate token-level relevance scores. We evaluate CoScoreX alongside six established explainers across four transformer models using Comprehensiveness and Sufficiency, two standard perturbation-based fidelity metrics, and apply Wilcoxon, Friedman, and dominance analyses to assess robustness. The results show that CoScoreX achieves competitive and stable fidelity, particularly in identifying compact token subsets under Sufficiency, and maintains consistent performance across architectures and tasks. We also discuss limitations related to embedding dependence, dataset homogeneity, and the scope of masking-based metrics. The study provides an evidence-based assessment of contextual and contrastive components in explanation methods for transformer-based software models.