Write a Blog >>
ICSE 2023
Sun 14 - Sat 20 May 2023 Melbourne, Australia
Sat 20 May 2023 12:00 - 12:15 at Meeting Room 103 - Session 1 - Position Papers

In recent years, Large Language Models (LLMs) have gained significant popularity due to their ability to generate human-like text and their potential applications in various fields, such as Software Engineering. LLMs for Code are commonly trained on large unsanitized corpora of source code scraped from the Internet. The content of these datasets is memorized and emitted by the models, often in a verbatim manner. In this work, we will discuss the security, privacy, and licensing implications of memorization. We argue why the use of copyleft code to train LLMs is a legal and ethical dilemma. Finally, we provide four actionable recommendations to address this issue.

Sat 20 May

Displayed time zone: Hobart change

11:00 - 12:30
Session 1 - Position PapersNLBSE at Meeting Room 103
11:00
60m
Keynote
Trends and Opportunities in the Application of Large Language Models: the Quest for Maximum Effect
NLBSE
12:00
15m
Short-paper
The (Ab)use of Open Source Code to Train Language Models
NLBSE
Ali Al-Kaswan Delft University of Technology, Netherlands, Maliheh Izadi Delft University of Technology
Pre-print
12:15
15m
Short-paper
Exploring Generalizability of NLP-based Models for Modern Software Development Cross-Domain Environements
NLBSE
Rrezarta Krasniqi University of North Carolina at Charlotte, Hyunsook Do University of North Texas