Is Hyper-Parameter Optimization Different for Software Analytics?
Yes. SE data can have “smoother” boundaries between classes (compared to traditional AI data sets). To be more precise, the magnitude of the second derivative of the loss function found in SE data is typically much smaller. A new hyper-parameter optimizer, called SMOOTHIE, can exploit this idiosyncrasy of SE data. We compare SMOOTHIE and a state-of-the-art AI hyper-parameter optimizer on three tasks: (a) GitHub issue lifetime prediction (b) detecting static code warnings false alarm; (c) defect prediction. For completeness, we also show experiments on some standard AI datasets. SMOOTHIE runs faster and predicts better on the SE data–but ties on non-SE data with the AI tool. Hence we conclude that SE data can be different to other kinds of data; and those differences mean that we should use different kinds of algorithms for our data. To support open science and other researchers working in this area, all our scripts and datasets are available on-line at \url{https://github.com/yrahul3910/smoothness-hpo/}.
SLIDES: https://timm.fyi/26smooth.pdf
Fri 17 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | AI for Software Engineering 20New Ideas and Emerging Results (NIER) / Research Track / Journal-first Papers at Asia I Chair(s): Ipek Ozkaya Carnegie Mellon University | ||
11:00 15mTalk | Is Hyper-Parameter Optimization Different for Software Analytics? Journal-first Papers Link to publication Pre-print | ||
11:15 15mTalk | On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization Journal-first Papers Giuseppe Crupi Università della Svizzera italiana, Rosalia Tufano Università della Svizzera Italiana, Alejandro Velasco William & Mary, Antonio Mastropaolo William and Mary, USA, Denys Poshyvanyk William & Mary, Gabriele Bavota Software Institute @ Università della Svizzera Italiana | ||
11:30 15mTalk | A Catalog of Data Smells for Coding Tasks Journal-first Papers Antonio Vitale Politecnico di Torino, University of Molise, Rocco Oliveto University of Molise, Simone Scalabrino University of Molise Link to publication | ||
11:45 15mTalk | Towards Automating Domain-Specific Data Generation for Text-to-SQL: A Comprehensive Approach Journal-first Papers Salmane Chafik UM6P College of Computing, Saad Ezzini King Fahd University of Petroleum and Minerals, Ismail Berrada UM6P College of Computing Link to publication DOI Pre-print File Attached | ||
12:00 15mTalk | Empirical and Sustainability Aspects of Software Engineering Research in the Era of Large Language Models: A Reflection New Ideas and Emerging Results (NIER) David Williams University College London, Maria Kechagia National and Kapodistrian University of Athens, Max Hort Simula Research Laboratory, Aldeida Aleti Monash University, Justyna Petke University College London, Federica Sarro University College London | ||
12:15 15mTalk | FORGE: An LLM-driven Framework for Large-Scale Smart Contract Vulnerability Dataset Construction Research Track Jiachi Chen Sun Yat-sen University, Yiming Shen Sun Yat-sen University, Jiashuo Zhang Peking University, China, Zihao Li Hong Kong Polytechnic University, John Grundy Monash University, Zhenzhe Shao Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Jiashui Wang Zhejiang University, Ting Chen University of Electronic Science and Technology of China, Zibin Zheng Sun Yat-sen University Pre-print Media Attached File Attached | ||