Towards Generalizable Instruction Vulnerability Prediction via LLM-Enhanced Code Representation
Discovering potential vulnerabilities has long been a fundamental goal in software security. Among them, bit flips, caused by hardware or environmental disturbances, are increasingly recognized as a new type of vulnerabilities that threaten program reliability at the instruction level. However, existing work is often restricted to individual programs and requires retraining when applied to unseen code, severely limiting their practicality and responsiveness. In this paper, we propose CIVP, a novel framework for context-aware instruction vulnerability prediction, generalizing to unseen programs without retraining. Specifically, to capture the rich contextual semantics of instructions, CIVP first leverages Large Language Models (LLMs) to accurately extract semantic embeddings of instructions. Then, CIVP further constructs an instruction execution graph containing complex relations of program execution, which implicates the potential path of error propagation. To improve instruction representation for vulnerability prediction, CIVP enhances GraphSAGE with multi-hop diffusion to capture inter-program structural patterns and contextual dependencies, and adopts pseudo-labeling to improve the model’s generalization for vulnerable instructions. Extensive experiments on a dataset of 26 real-world programs demonstrate that CIVP significantly outperforms the state-of-the-art approaches, achieving up to 20.5%↑ Recall and 18.5%↑ F1-score improvements. Notably, CIVP generalizes well to unseen programs, offering an efficient and scalable solution for proactive instruction-level hardening before software deployment.