Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing
This paper explores pruning attention heads as a post-processing bias mitigation method for large language models (LLMs).
Modern AI systems such as LLMs are expanding into sensitive social contexts and socio-economic decision-making where fairness concerns become especially crucial. Since LLMs develop their decision-making patterns by training on massive datasets of human-generated content, they naturally encode and perpetuate societal biases. While modifying training datasets and algorithms is expensive and requires significant resources, post-processing techniques—such as selectively deactivating attention heads in pre-trained LLMs—can provide feasible and effective approaches to improve fairness. However, identifying the optimal subset of parameters to prune presents a combinatorial challenge within the immense parameter space of LLMs, requiring efficient solutions that efficiently balance competing objectives across the frontiers of model fairness and utility.
We explore a search-based program repair approach via simulated annealing to address the computational challenges. Given the prohibitive evaluation costs in billion-parameter LLMs, we develop surrogate deep neural networks that efficiently model the relationship between attention head states (active/inactive) and their corresponding fairness/utility metrics. This allows us to perform optimization over the surrogate models and efficiently identify optimal subsets of attention heads for pruning rather than directly searching through the LLM parameter space. This paper introduces Attention Pruning, a fairness-aware surrogate simulated annealing approach to prune attention heads in LLMs that disproportionately contribute to bias while minimally impacting overall model utility. Our experimental evaluation shows that Attention Pruning achieves a reduction of up to $40%$ in gender bias and outperforms state-of-the-art bias mitigation strategies.
Warning: This paper contains content that some readers may find offensive and harmful.
Wed 15 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | Software Engineering for AI 1Research Track / SE in Society (SEIS) / SE In Practice (SEIP) at Oceania VII Chair(s): Sira Vegas Universidad Politecnica de Madrid | ||
11:00 15mTalk | Fairness Is Not Just Ethical: Performance Trade-Off via Data Correlation Tuning to Mitigate Bias in ML Software Research Track Ying Xiao , Shangwen Wang National University of Defense Technology, Sicen Liu Southern University of Science and Technology, Dingyuan Xue Southern University of Science and Technology, Xian Zhan Southern University of Science and Technology, Yepang Liu Southern University of Science and Technology, Jie M. Zhang King's College London | ||
11:15 15mTalk | TACO: Trust Assessment of Large Language Models in Coding Assistance Tasks Research Track Shihao Weng Nanjing University, Yang Feng Nanjing University, Jincheng Li Nanjing University, Yining Yin Nanjing University, Zhenlun Zhang Nanjing University, Lyuxi Liu University of Virginia, Jia Liu Nanjing University | ||
11:30 15mTalk | Toward Systematic Counterfactual Fairness Evaluation of Large Language Models: The CAFFE Framework Research Track Alessandra Parziale Gran Sasso Science Institute, Gianmario Voria University of Salerno, Valeria Pontillo Gran Sasso Science Institute, Gemma Catolino University of Salerno, Andrea De Lucia University of Salerno, Fabio Palomba University of Salerno | ||
11:45 15mTalk | Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing Research Track Vishnu Asutosh Dasu Pennsylvania State University, Md Rafi Ur Rashid Pennsylvania State University, Vipul Gupta Pennsylvania State University, Saeid Tizpaz-Niari University of Illinois Chicago, Gang (Gary) Tan Pennsylvania State University | ||
12:00 15mTalk | Building an Open AIBOM Standard in the Wild: An Experience Report on Extending the SPDX SBOM (ISO/IEC 5962:2021) for AI Supply Chains SE In Practice (SEIP) Gopi Krishnan Rajbahadur , Keheliya Gallaba Centre for Software Excellence, Huawei Canada, Elyas Rashno Queen's University, Arthit Suriyawongkul ADAPT Centre, Trinity College Dublin, Karen Bennet IEEE, Kate Stewart Linux Foundation, Ahmed E. Hassan Queen’s University Pre-print | ||
12:15 15mTalk | Data-Dependent Goal modeling for ML-Enabled Law Enforcement Systems SE in Society (SEIS) Dalal Alrajeh Imperial College London, Vesna Nowack Imperial College London, Patrick Benjamin University of Oxford, Katie Thomas University of Bath, William Hobson University of Bath, Carolina Gutierrez Munoz University of Bath, Catherine Hamilton-Giachritsis University of Bath, Juliane Kloess University of Edinburgh, Jessica Woodhams University of Birmingham, Daniel Butler Independent researcher, Mark Law ILASP, Ralph Morton Aston University, Benjamin Costello University of Birmingham, Amy Burrell University of Birmingham, Tim Grant Aston University, Prachiben Shah University of Birmingham, Frances Laureano de Leon University of Birmingham, Mark Lee University of Birmingham | ||