On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Foundation models (FM), such as large language models (LLMs), which are large-scale machine learning (ML) models, have demonstrated remarkable adaptability in various downstream software engineering (SE) tasks, such as code completion, code understanding, and software development. As a result, FM leaderboards have become essential tools for SE teams to compare and select the best third-party FMs for their specific products and purposes. However, the lack of standardized guidelines for FM evaluation and comparison threatens the transparency of FM leaderboards and limits stakeholders’ ability to perform effective FM selection. As a first step towards addressing this challenge, our research focuses on understanding how these FM leaderboards operate in real-world scenarios (“leaderboard operations”) and identifying potential pitfalls and areas for improvement (“leaderboard smells”). In this regard, we collect up to 1,045 FM leaderboards from five different sources: GitHub, Hugging Face Spaces, Papers With Code, spreadsheet and independent platform, to examine their documentation and engage in direct communication with leaderboard operators to understand their workflows. Through card sorting and negotiated agreement, we identify five distinct workflow patterns and develop a domain model that captures the key components and their interactions within these workflows. We then identify eight unique types of leaderboard smells in LBOps. By mitigating these smells, SE teams can improve transparency, accountability, and collaboration in current LBOps practices, fostering a more robust and responsible ecosystem for FM comparison and selection.
Tue 24 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
16:00 - 17:40 | LLM for SE 3Ideas, Visions and Reflections / Industry Papers / Demonstrations / Journal First at Cosmos 3A Chair(s): Maliheh Izadi Delft University of Technology | ||
16:00 20mTalk | LicenseGPT: A Fine-tuned Foundation Model for Publicly Available Dataset License Compliance Industry Papers JingwenTan School of Software Engineering, Sun Yat-Sen University, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Zi Li Huawei China, xiangfu song Huawei Canada Research Centre, jianshan lin Huawei Technologies Co. Ltd, Dan Li Sun Yat-sen University, Zibin Zheng Sun Yat-sen University, Ahmed E. Hassan Queen’s University | ||
16:20 20mTalk | LLM-Augmented Ticket Aggregation for Low-cost Mobile OS Defect Resolution Industry Papers Yongqian Sun Nankai University, Bowen Hao Nankai University, Xiaotian Wang Nankai University, Chenyu Zhao Nankai University, Yongxin Zhao , Binpeng Shi Nankai University, Shenglin Zhang Nankai University, Qiao Ge Huawei Inc., Wenhu Li Huawei Inc., Hua Wei Huawei Inc., Dan Pei Tsinghua University | ||
16:40 20mTalk | On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards Journal First Zhimin Zhao Queen's University, Abdul Ali Bangash Queen's University, Filipe Cogo Centre for Software Excellence, Huawei Canada, Bram Adams Queen's University, Ahmed E. Hassan Queen’s University | ||
17:00 10mTalk | CodingGenie: A Proactive LLM-Powered Programming Assistant Demonstrations Sebastian Zhao University of California, Berkeley, Alan Zhu Carnegie Mellon University, Hussein Mozannar Microsoft Research, David Sontag MIT, Ameet Talwalkar Carnegie Mellon University, Valerie Chen Carnegie Mellon University | ||
17:10 10mTalk | Collaboration is all you need: LLM Assisted Safe Code Translation Ideas, Visions and Reflections Rabimba Karanjai University of Houston, Sam Blackshear Mysten Labs, Lei Xu Kent State University, Weidong Shi University of Houston | ||
17:20 20mTalk | Exploring Variable Potential for LLM-based Log Parsing Efficiency and Reduced Costs Ideas, Visions and Reflections Jinrui Sun Peking University, Tong Jia Institute for Artificial Intelligence, Peking University, Beijing, China, Minghua He Peking University, Yihan Wu National Computer Network Emergency Response Technical Team/Coordination Center of China, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Gang Huang Peking University |
Cosmos 3A is the first room in the Cosmos 3 wing.
When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.