Fri 23 Feb 2024 12:30 - 13:00 at Room 2 - R102 - SEIP Session 1

Natural language processing (NLP) models are increasingly being based on transformers-based language models that are initialized with pre-trained parameters. These parameters are learnt from the huge amount of text present on internet and provide a strong initialization through transfer learning but at the same time contain harmful prejudices. These harmful prejudices can lead to bias against certain demographics as and when these models are used in production. In this paper we study the bias transfer due to transfer learning into downstream tasks and analyze the number of biases absorbed by language models during pre-training and their transfer into task-specific behavior after fine-tuning. We discover that minimizing inherent bias with controlled interventions prior to fine-tuning has minimal effect on lowering the biased behavior of the classifier. Biases present in the domain-specific dataset appear to be a more plausible explanation for the subsequent biased behavior. However, we also observe that pre-training matters: after the model has been pre-trained, even small changes to co-occurrence rates in the fine-tuning dataset has a significant effect on the performance of the model. The outcomes of our study motivate practitioners to concentrate more on context-specific hazards and dataset quality.

Fri 23 Feb

Displayed time zone: Chennai, Kolkata, Mumbai, New Delhi change