Fairness Mediator: Neutralize Stereotype Associations to Mitigate Bias in Large Language Models (ISSTA 2025 - Research Papers)

Who

Yisong Xiao, Aishan Liu, Siyuan Liang, Xianglong Liu, Dacheng Tao

Track

ISSTA 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 25 Jun 2025 11:00 - 11:25 at Cosmos 3A - Fairness and LLM Testing Chair(s): Andreas Metzger

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance across diverse applications, yet they inadvertently absorb spurious correlations from training data, leading to stereotype associations between biased concepts and specific social groups. These associations perpetuate and even amplify harmful social biases, raising significant concerns about fairness. To mitigate such biases, prior studies have attempted to project model embeddings into unbiased spaces during inference. However, these approaches have shown limited effectiveness due to their weak alignment with downstream social biases. Inspired by the observation that concept cognition in LLMs is primarily represented through a linear associative memory mechanism, where key-value mapping occurs in the MLP layers, we posited that biased concepts and social groups are similarly encoded as entity (key) and information (value) pairs, which can be manipulated to promote fairer associations. To this end, we propose Fairness Mediator (FairMed), an effective and efficient bias mitigation framework that neutralizes stereotype associations. Our framework comprises two main components: a stereotype association prober and an adversarial debiasing neutralizer. The prober captures stereotype associations encoded within MLP layer activations by employing prompts centered around biased concepts (keys) to detect the emission probabilities for social groups (values). Subsequently, the adversarial debiasing neutralizer intervenes in MLP activations during inference to equalize the association probabilities among different social groups. Extensive experiments across nine protected attributes demonstrate that our FairMed significantly outperforms state-of-the-art methods in effectiveness, achieving average bias reductions of up to 84.42% and 80.36% for $s_{\text{DIS}}$ and $s_{\text{AMB}}$, respectively. Compared to the most effective baseline, FairMed presents competitive efficiency by cutting mitigation overhead by hundreds of minutes. FairMed also maintains the LLM’s language understanding capabilities without compromising overall performance. Our codes can be found on our website.

DOI

https://doi.org/10.1145/3728881

Yisong Xiao

Beihang University

China

Aishan Liu

Beihang University; Institute of Dataspace

China

Siyuan Liang

National University of Singapore

Xianglong Liu

Beihang University; Institute of Dataspace; Zhongguancun Laboratory

China

Dacheng Tao

Nanyang Technological University

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 25 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

11:00 - 12:15	Fairness and LLM TestingResearch Papers at Cosmos 3A Chair(s): Andreas Metzger University of Duisburg-Essen

11:00 25m Talk		Fairness Mediator: Neutralize Stereotype Associations to Mitigate Bias in Large Language Models Research Papers Yisong Xiao Beihang University, Aishan Liu Beihang University; Institute of Dataspace, Siyuan Liang National University of Singapore, Xianglong Liu Beihang University; Institute of Dataspace; Zhongguancun Laboratory, Dacheng Tao Nanyang Technological University DOI
11:25 25m Talk		ClassEval-T: Evaluating Large Language Models in Class-Level Code Translation Research Papers Pengyu Xue Shandong University, Linhao Wu Shandong University, Zhen Yang Shandong University, Chengyi Wang Shandong University, Xiang Li Shandong University, Yuxiang Zhang Shandong University, Jia Li Tsinghua University, Ruikai Jin Shandong University, Yifei Pei Shandong University, Zhaoyan Shen Shandong University, Xiran Lyu Shandong University, Jacky Keung City University of Hong Kong DOI
11:50 25m Talk		No Bias Left Behind: Fairness Testing for Deep Recommender Systems Targeting General Disadvantaged Groups Research Papers Zhuo Wu Tianjin International Engineering Institute, Tianjin University, Zan Wang Tianjin University, Chuan Luo Beihang University, Xiaoning Du Monash University, Junjie Chen Tianjin University DOI

Information for Participants

Wed 25 Jun 2025 11:00 - 12:15 at Cosmos 3A - Fairness and LLM Testing Chair(s): Andreas Metzger

Info for room Cosmos 3A:

Cosmos 3A is the first room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.