Data Selection Driven by Item Difficulty: On Investigating Data Efficient Practice for Hyperparameter Search (CAIN 2024 - Posters) - CAIN 2024

Sun 14 - Mon 15 April 2024 Lisbon, Portugal

co-located with ICSE 2024

Who

Gustavo Rodrigues dos Reis, Adrian Mos, Mario Cortes Cornax, Cyril Labbé

Track

CAIN 2024 Posters

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Mon 15 Apr 2024 09:12 - 09:15 at Pequeno Auditório - Keynote and Posters Chair(s): Jan Bosch, Henry Muccini

Abstract

Foundation Models shift the interest to adapting models instead of creating proprietary models from scratch. Despite this change, performing hyperparameter optimization (HPO) is still needed. Users adapting systems powered by those models on proprietary data should not considerably increase the overall resource footprint with extensive hyperparameter search. Given that this footprint is also proportional to the data used in HPO, we aim to investigate how a user can effectively reduce the amount of data used, leveraging the deep learning model’s empirical facility to output the expected correct result for an item in the dataset.

In this work, we describe a methodology for accomplishing this data reduction through estimating a measure of an item’s difficulty. This method allows keeping only a portion of data that conserves the overall proportions of item difficulty throughout the dataset while helping order them meaningfully. The rationale is derived from results from curriculum learning research as we try to answer if the adapted models could help organize and select subsets of data representative of the whole. Preliminary results of evaluating the method are provided for image recognition and scientific name entity recognition (NER). We observe that the amount of data for HPO can be reduced as far as 60% and still point to the same choice of hyperparameters compared to using the whole training set.

Gustavo Rodrigues dos Reis

NAVER LABS Europe/LIG - UGA

France

Adrian Mos

NAVER LABS Europe

France

Mario Cortes Cornax

LIG - UGA

Cyril Labbé

LIG - UGA

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Mon 15 Apr
Displayed time zone: Lisbon change

	09:00 - 10:30	Keynote and PostersPosters / Research and Experience Papers at Pequeno Auditório Chair(s): Jan Bosch Chalmers University of Technology, Henry Muccini University of L'Aquila, Italy

	09:00 3m Talk		A Domain Specific Language for Specification of Risk-oriented Object Detection Requirements Posters Junji Hashimoto GREE, Inc., Nobukazu Yoshioka Waseda University
	09:03 3m Talk		AI Security Continuum: Concept and Challenges Posters Hironori Washizaki Waseda University, Nobukazu Yoshioka Waseda University
	09:06 3m Talk		A Roadmap for Enriching Jupyter Notebooks Documentation with Kaggle Data Posters Mojtaba Mostafavi Department of Computer Engineering of Sharif University of Technology, Hamed Jahantigh Department of Computer Engineering of Sharif University of Technology, Alireza Asadi Department of Computer Engineering of Sharif University of Technology, Sepehr Kianian Department of Computer Engineering of Sharif University of Technology, Ashkan Khademian Department of Computer Engineering of Sharif University of Technology, Abbas Heydarnoori Bowling Green State University
	09:09 3m Talk		Automating Patch Set Generation from Code Reviews Using Large Language Models Posters Md Tajmilur Rahman Gannon University
	09:12 3m Talk		Data Selection Driven by Item Difficulty: On Investigating Data Efficient Practice for Hyperparameter Search Posters Gustavo Rodrigues dos Reis NAVER LABS Europe/LIG - UGA, Adrian Mos NAVER LABS Europe, Mario Cortes Cornax LIG - UGA, Cyril Labbé LIG - UGA
	09:15 3m Talk		Beyond Syntax: Unleashing the Power of Computational Notebooks Code Metrics in Documentation Generation Posters Mojtaba Mostafavi Department of Computer Engineering of Sharif University of Technology, Ashkan Khademian Department of Computer Engineering of Sharif University of Technology, Sepehr Kianian Department of Computer Engineering of Sharif University of Technology, Alireza Asadi Department of Computer Engineering of Sharif University of Technology, Hamed Jahantigh Department of Computer Engineering of Sharif University of Technology, Abbas Heydarnoori Bowling Green State University
	09:18 3m Talk		Can causality accelerate experimentation in software systems? Posters Andrei Paleyes Department of Computer Science and Technology, Univesity of Cambridge, Han-Bo Li Department of Computer Science and Technology, University of Cambridge, Neil D. Lawrence Department of Computer Science and Technology, Univesity of Cambridge
	09:21 3m Talk		Custom Developer GPT for Ethical AI Solutions Posters Lauren Olson Vrije Universiteit Amsterdam Pre-print
	09:24 3m Talk		Evaluation of The Generality of Multi-view Modeling Framework for ML Systems Posters Jati H. Husen Waseda University, Japan, Jomphon Runpakprakun Waseda University, Japan, Sun Chang Waseda University, Japan, Hironori Washizaki Waseda University, Hnin Thandar Tun Waseda University, Japan, Nobukazu Yoshioka Waseda University, Japan, Yoshiaki Fukazawa Waseda University
	09:27 3m Talk		Prompt Smells: An Omen for Undesirable Generative AI Outputs Posters Krishna Ronanki University Of Gothenburg, Beatriz Cabrero-Daniel University of Gothenburg, Christian Berger Chalmers University of Technology, Sweden
	09:30 3m Talk		Taxonomy of Generative AI Applications for Risk Assessment Posters Hiroshi Tanaka Fujitsu Limited, Tokyo, Japan, Masaru Ide Fujitsu Limited, Jun Yajima Fujitsu Limited, Sachiko Onodera Fujitsu Limited, Kazuki Munakata Fujitsu Limited, Tokyo, Japan, Nobukazu Yoshioka Waseda University, Japan
	09:35 55m Keynote		Keynote by Christian Kästner - From Models to Systems: On the Role of Software Engineering for Machine Learning Research and Experience Papers Christian Kästner Carnegie Mellon University