Automated Detection of Algorithm Debt in Deep Learning Frameworks: An Empirical Study (ICSME 2024 - Registered Reports Track)

Who

Emmanuel Iko-Ojo Simon, Chirath Hettiarachchi, Alex Potanin, Hanna Suominen, Fatemeh Hendijani Fard

Track

ICSME 2024 Registered Reports Track

Time Zone

The program is currently displayed in (GMT-07:00) Arizona.

Use conference time zone: (GMT-07:00) ArizonaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 11 Oct 2024 16:55 - 17:00 at Fremont - Session 16: Software Development Process and Tools Chair(s): Shurui Zhou

Abstract

Context: Previous studies demonstrate that Machine or Deep Learning (ML/DL) models can detect Technical Debt from source code comments called Self-Admitted Technical Debt (SATD). Despite the importance of ML/DL in software development, limited studies focus on automated detection for new SATD types: Algorithm Debt (AD). AD detection is important because it helps to identify TD early, facilitating research, learning, and preventing the accumulation of issues related to model degradation and lack of scalability. Aim: Our goal is to improve AD detection performance of various ML/DL models. Method: We will perform empirical studies using approaches: TF-IDF, Count Vectorizer, Hash Vectorizer, and TD-indicative words to identify features that improve AD detection, using ML/DL classifiers with different data featurisations. We will use an existing dataset curated from seven DL frameworks where comments were manually classified as AD, Compatibility, Defect, Design, Documentation, Requirement, and Test Debt. We will explore various word embedding methods to further enrich features for ML models. These embeddings will be from models founded in DL such as ROBERTA, ALBERTv2, and large language models (LLMs): INSTRUCTOR and VOYAGE AI. We will enrich the dataset by incorporating AD-related terms, then train various ML/DL classifiers, Support Vector Machine, Logistic Regression, Random Forest, ROBERTA, and ALBERTv2

Link to Preprint

https://arxiv.org/pdf/2408.10529

DOI

https://doi.org/10.48550/arXiv.2408.10529

Emmanuel Iko-Ojo Simon

Australian National University

Australia

Chirath Hettiarachchi

Australian National University

Australia

Alex Potanin

Australian National University

Australia

Hanna Suominen

Australian National University

Australia

Fatemeh Hendijani Fard

University of British Columbia

Canada

Time Zone

The program is currently displayed in (GMT-07:00) Arizona.

Use conference time zone: (GMT-07:00) ArizonaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 11 Oct
Displayed time zone: Arizona change

15:30 - 17:00	Session 16: Software Development Process and ToolsTool Demo Track / Industry Track / Registered Reports Track / Research Track at Fremont Chair(s): Shurui Zhou University of Toronto

15:30 15m		On the Impact of Draft Pull Requests on Accelerating FeedbackResearch Track Paper Research Track Firas Harbaoui , Mohammed Sayagh ETS Montreal, University of Quebec, Rabe Abdalkareem Omar Al-Mukhtar University
15:45 15m		Take Loads Off Your Developers : Automated User Story Generation Using Large Language ModelIndustry Track Paper Industry Track Tajmilur Rahman University of Saskatchewan, Yuecai Zhu Bell Mobility, Lamyea Maha University of Saskatchewan, Chanchal K. Roy University of Saskatchewan, Canada, Banani Roy University of Saskatchewan, Kevin Schneider University of Saskatchewan
16:00 10m		PseudoSweep: A Pseudo-Tested Code IdentifierTool Demo Paper Tool Demo Track Megan Maton University of Sheffield, Gregory Kapfhammer Allegheny College, Phil McMinn University of Sheffield
16:10 10m		GitTruck@Duck - Interactive Time Range Selection in Hierarchy-Oriented Polymetric Visualization of Git Repository EvolutionTool Demo Paper Tool Demo Track Adrian Hoff IT University of Copenhagen, Thomas Hoffmann Kilbak IT University of Copenhagen, Leonel Merino Pontificia Universidad Católica de Chile, Mircea Lungu IT University, Copenhagen Media Attached
16:20 10m		iRisk: A Scalable Microservice for Classifying Issue Risks Based on Crowdsourced App ReviewsTool Demo Paper Tool Demo Track Vitor Mesaque Alves de Lima Federal University of Mato Grosso do Sul, Jacson Rodrigues Barbosa Institute of Informatics (INF) / Federal University of Goiás (UFG), Ricardo Marcondes Marcacini University of São Paulo
16:30 15m		If it’s not SBOM, then what? How Italian Practitioners Manage the Software Supply ChainIndustry Track Paper Industry Track Sabato Nocera University of Salerno, Massimiliano Di Penta University of Sannio, Italy, Rita Francese University of Salerno, Simone Romano University of Salerno, Giuseppe Scanniello University of Salerno
16:45 10m		ROOT: Requirements Organization and Optimization ToolTool Demo Paper Tool Demo Track Katherine R. Dearstyne University of Notre Dame, Alberto D. Rodriguez University of Notre Dame, Jane Cleland-Huang University of Notre Dame
16:55 5m		Automated Detection of Algorithm Debt in Deep Learning Frameworks: An Empirical StudyRegistered Reports Paper Registered Reports Track Emmanuel Iko-Ojo Simon Australian National University, Chirath Hettiarachchi Australian National University, Alex Potanin Australian National University, Hanna Suominen Australian National University, Fatemeh Hendijani Fard University of British Columbia DOI Pre-print