Open development of Large Language Models for code with BigCode and StarCoder2
In the rapidly evolving landscape of software development, Large Language Models (LLMs) for code have emerged as groundbreaking tools for code completion, synthesis, and analysis. BigCode is an open scientific collaboration for the responsible development of code LLMs. In this talk, we will cover some of the foundational elements of BigCode, including open large-scale code datasets such as The Stack, data governance, and transparency standards, as well as our approach for training the competitive StarCoder and StarCoder2 models.
Loubna Ben Allal is a Machine Learning Engineer in the Science team at Hugging Face working on Large Language Models for code & Synthetic data generation. She is part of the core team behind the BigCode Project and has co-authored The Stack dataset and StarCoder models for code generation. Loubna holds Mathematics & Deep Learning Master’s Degrees from Ecole des Mines de Nancy and ENS Paris Saclay.
Sat 20 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | Session 3: Keynote 2 + Position PapersLLM4Code at Luis de Freitas Branco Chair(s): Lingming Zhang University of Illinois at Urbana-Champaign | ||
14:00 50mKeynote | Open development of Large Language Models for code with BigCode and StarCoder2 LLM4Code Loubna Ben Allal Hugging Face | ||
14:50 8mTalk | Benchmarking the Security Aspect of Large Language Model-Based Code Generation LLM4Code Pre-print | ||
14:58 8mTalk | Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context LLM4Code Yichen LI The Chinese University of Hong Kong, Yun Peng The Chinese University of Hong Kong, Yintong Huo The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong Pre-print | ||
15:06 8mTalk | Evaluating Fault Localization and Program Repair Capabilities of Existing Closed-Source General-Purpose LLMs LLM4Code Shengbei Jiang Beijing Jiaotong University, Jiabao Zhang Beijing Jiaotong University, Wei Chen Beijing Jiaotong University, Bo Wang Beijing Jiaotong University, Jianyi Zhou Huawei Cloud Computing Technologies Co., Ltd., Jie M. Zhang King's College London Pre-print | ||
15:14 8mTalk | MoonBit: Explore the Design of an AI-Friendly Programming Language LLM4Code Haoxiang Fei International Digital Economy Academy, Yu Zhang International Digital Economy Academy, Hongbo Zhang International Digital Economy Academy, Yanlin Wang Sun Yat-sen University, Qing Liu International Digital Economy Academy Pre-print | ||
15:22 8mTalk | Toward a New Era of Rapid Development: Assessing GPT-4-Vision's Capabilities in UML-Based Code Generation LLM4Code Gabor Antal University of Szeged, Richárd Vozár Department of Software Engineering, University of Szeged, Hungary, Rudolf Ferenc University of Szeged |