Fri 23 Feb 2024 12:00 - 12:30 at Room 2 - R102 - SEIP Session 1

In this paper we calculate the impact of protected attributes on the final model decision in the form of saliency scores. The geometry of word embeddings can be used to extract the sub-spaces of these attributes such as gender, ethnicity, sexual orientation, etc. We use this orientation to calculate a bias score for each word and phrase from the embeddings space. We demonstrate empirically that in situations when access to the human annotator is restricted, this score might be utilized as a stand-in for the protected property. Furthermore, the directional derivative of the model along the bias direction can be used for fairness testing. This can provide token level sensitivity of the model due to biased content embedded in the word embedding.

Fri 23 Feb

Displayed time zone: Chennai, Kolkata, Mumbai, New Delhi change