Data vs. Model Machine Learning Fairness Testing: An Empirical Study (ICSE 2024 - Posters)

Who

Arumoy Shome, Luís Cruz, Arie van Deursen

Track

ICSE 2024 Posters

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 18 Apr 2024 15:30 - 16:00 at Open Space - Posters 4

Abstract

Although several fairness definitions and bias mitigation techniques exist in the literature, all existing solutions evaluate fairness of Machine Learning (ML) systems after the training stage. In this paper, we take the first steps towards evaluating a more holistic approach by testing for fairness both before and after model training. We evaluate the effectiveness of the proposed approach and position it within the ML development lifecycle, using an empirical analysis of the relationship between model dependent and independent fairness metrics. The study uses 2 fairness metrics, 4 ML algorithms, 5 real-world datasets and 1600 fairness evaluation cycles. We find a linear relationship between data and model fairness metrics when the distribution and the size of the training data changes. Our results indicate that testing for fairness prior to training can be a ``cheap'' and effective means of catching a biased data collection process early; detecting data drifts in production systems and minimising execution of full training cycles thus reducing development time and costs.

Arumoy Shome

Delft University of Technology

Netherlands

Luís Cruz

Delft University of Technology

Netherlands

Arie van Deursen

Delft University of Technology

Netherlands

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 18 Apr
Displayed time zone: Lisbon change

15:30 - 16:00	Posters 4Posters at Open Space

15:30 30m Poster		Towards Data Augmentation for Supervised Code Translation Posters Binger Chen Technische Universität Berlin, Jacek golebiowski Amazon AWS, Ziawasch Abedjan Leibniz Universität Hannover
15:30 30m Poster		GDPR indications in commits messages in GitHub repositories Posters Georgia Kapitsaki University of Cyprus, Maria Papoutsoglou University of Cyprus
15:30 30m Poster		Automatic Generation of Test Cases based on Bug Reports: a Feasibility Study with Large Language Models Posters Laura Plein University of Luxembourg, Wendkuuni Arzouma Marc Christian OUEDRAOGO University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg
15:30 30m Poster		How Does Pre-trained Language Model Perform on Deep Learning Framework Bug Prediction? Posters Xiaoting Du Beijing University of Posts and Telecommunications, Chenglong Li Beihang University, Xiangyue Ma Beihang University, Zheng Zheng Beihang University
15:30 30m Poster		xNose: A Test Smell Detector for C# Posters Partha Protim Paul Shahjalal University of Science & Technology, Md Tonoy Akanda Shahjalal University of Science & Technology, Mohammed Raihan Ullah Shahjalal University of Science & Technology, Dipto Mondal Shahjalal University of Science & Technology, Nazia Sultana Chowdhury Shahjalal University of Science & Technology, Fazle Mohammed Tawsif University of Southern California DOI Pre-print
15:30 30m Poster		Data vs. Model Machine Learning Fairness Testing: An Empirical Study Posters Arumoy Shome Delft University of Technology, Luís Cruz Delft University of Technology, Arie van Deursen Delft University of Technology
15:30 30m Poster		On the Effects of Program Slicing for Vulnerability Detection during Code Inspection: Extended Abstract Posters Aurora Papotti Vrije Universiteit Amsterdam, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam, Katja Tuma Vrije Universiteit Amsterdam
15:30 30m Poster		Multi-step Automated Generation of Parameter Docstrings in Python: An Exploratory Study Posters Vatsal Venkatkrishna Australian National University, Durga Shree Nagabushanam Australian National University, Emmanuel Iko-Ojo Simon Australian National University, Melina Vidoni Australian National University DOI Authorizer link
15:30 30m Poster		Lightweight Semantic Conflict Detection with Static Analysis Posters Galileu Santos de Jesus Federal University of Pernambuco, Paulo Borba Federal University of Pernambuco, Rodrigo Bonifácio Computer Science Department - University of Brasília, Matheus Barbosa de Oliveira Federal University of Pernambuco
15:30 30m Poster		Energy Consumption of Automated Program Repair Posters Matias Martinez Universitat Politècnica de Catalunya (UPC), Silverio Martínez-Fernández UPC-BarcelonaTech, Xavier Franch Universitat Politècnica de Catalunya
15:30 30m Poster		ReviewRanker: A Semi-Supervised Learning Based Approach for Code Review Quality Estimation Posters Saifullah Mahbub United International University, Md. Easin Arafat Eötvös Loránd University, Chowdhury Rafeed Rahman National University of Singapore, Zannatul Ferdows United International University, Masum Hasan University of Rochester
15:30 30m Poster		LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analysis Posters Yilun Liu Huawei co. LTD, Shimin Tao University of Science and Technology of China; Huawei co. LTD, Weibin Meng Huawei co. LTD, Feiyu Yao Huawei co. LTD, Xiaofeng Zhao Huawei co. LTD, Hao Yang Huawei co. LTD
15:30 30m Poster		High-precision Online Log Parsing with Large Language Models Posters XiaoLei Chen Fudan University, Jie Shi Fudan University, ChenJ , Peng Wang Fudan University, Wei Wang Fudan University
15:30 30m Poster		Multi-requirement Parametric Falsification Posters Matteo Camilli Politecnico di Milano, Raffaela Mirandola Karlsruhe Institute of Technology (KIT)