Beyond Binary Moderation: Identifying Fine-Grained Sexist and Misogynistic Behavior on GitHub with Large Language Models
Background: Sexist and misogynistic behavior significantly hinders inclusion in technical communities like GitHub, causing developers, especially minorities, to leave due to subtle biases and micro-aggressions. Current moderation tools primarily rely on keyword filtering or binary classifiers, limiting their ability to detect nuanced harm effectively.
Aims: This study introduces a fine-grained, multi-class classification framework leveraging instruction-tuned Large Language Models (LLMs) to identify twelve distinct categories of sexist and misogynistic GitHub comments.
Method: We utilized an instruction-tuned LLM-based framework with systematic prompt refinement across 20 iterations, evaluated on 1,440 labeled GitHub comments across twelve sexism/misogyny categories. Model performances were rigorously compared using precision, recall, and F1-score and Matthews Correlation Coefficient (MCC). Results: Our optimized approach (GPT-4o with Prompt 19) achieved an MCC of 0.501, significantly outperforming baseline approaches. While this model had low false positives, it struggled to interpret nuanced, context-dependent sexism and misogyny reliably. Conclusion: Well-designed prompts with clear definitions and structured outputs significantly improve the accuracy and interpretability of sexism detection, enabling precise and practical moderation on developer platforms like GitHub.


