ASE 2023
Mon 11 - Fri 15 September 2023 Kirchberg, Luxembourg
Thu 14 Sep 2023 14:00 - 14:40 at Room FR - SATE - Software Engineering at the Era of LLMs Chair(s): Xin Xia

Abstract: LLMs are increasingly used not just for autocompletion, but also for code generation from natural language and APIs and other tasks. The output they produce, however, is based on the input data that is nominally permissively licensed, but is not curated for quality, security, performance, or other factors, such as whether the code’s license is authentic. This leads to buggy, insecure, poorly performing, or inappropriately licensed output that is already poisoning the rapidly growing OSS codebase. Problematic inputs will result in problematic outputs even if all the LLM hallucinations were to be removed, hence stronger provenance tracking and quality assurance for LLM training and fine-tuning inputs is essential to improve quality of the generated code. We suggest approaches to use World of Code research infrastructure to curate LLM training data via de-duplicating and auto curating source code based on the OSS-wide software supply chain properties derived from the nearly complete collection of OSS source code.

Audris Mockus is the Ericsson-Harlan D. Mills Chair Professor of Digital Archeology and Evidence Engineering in the Department of Electrical Engineering and Computer Science of the University of Tennessee, Knoxville and Senior Scientist at Vilnius University. He studies software developers’ culture and behavior through the recovery, documentation, and analysis of digital remains, in other words, Digital Archaeology. These digital traces reflect projections of collective and individual activity. He reconstructs the reality from these projections by designing data mining methods to summarize and augment these digital traces, interactive visualization techniques to inspect, present, and control the behavior of teams and individuals, and statistical models and optimization techniques to understand the nature of individual and collective behavior.

presentation (wocllm.pptx (1).pdf)322KiB

Thu 14 Sep

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

13:20 - 15:20
SATE - Software Engineering at the Era of LLMsSATE - Software Engineering at the Era of LLMs at Room FR
Chair(s): Xin Xia Huawei Technologies
13:20
40m
Talk
Towards Better Software Quality in the Era of Large Language Models
SATE - Software Engineering at the Era of LLMs
Lingming Zhang University of Illinois at Urbana-Champaign
14:00
40m
Talk
Securing LLM-based Software Supply Chains
SATE - Software Engineering at the Era of LLMs
Audris Mockus Vilnius University & The University of Tennessee
File Attached
14:40
40m
Talk
BEWARE: some of the deep learning rhetoric is misleading
SATE - Software Engineering at the Era of LLMs
Tim Menzies North Carolina State University
Pre-print