Keynote 2: Towards Autonomous Language Model Systems (zoom talk) (LLM4Code 2025)

Track

LLM4Code 2025 Large Language Models for Code

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 3 May 2025 11:00 - 12:00 at 214 - Keynote 2 / Paper Session 2 Chair(s): Prem Devanbu

Abstract

Language models (LMs) are increasingly used to assist users in day to day tasks such as programming (Github Copilot) or search (Google’s AI Overviews). But can we build language model systems that are able to autonomously complete entire tasks end-to-end? In this talk I’ll discuss our efforts to build autonomous LM systems, focusing on the software engineering domain. I’ll present SWE-bench, our novel method for measuring AI systems on their abilities to fix real issues in popular software libraries. I’ll then discuss SWE-agent, our system for solving SWE-bench tasks. SWE-bench and SWE-agent are used by many leading AI orgs in academia and industry including OpenAI, Anthropic, Meta, and Google, and SWE-bench has been downloaded over 2 million times. These projects show that academics on tight budgets are able to have substantial impact in steering the research community towards building autonomous systems that can complete challenging tasks.

Bio

I am a postdoc at Princeton University where I mainly work with Karthik Narasimhan’s lab. I previously completed my PhD at the University of Washington in Seattle, where I was advised by Noah Smith. During my PhD I spent two years at Facebook AI Research Labs on Luke Zettlemoyer’s team.

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sat 3 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Keynote 2 / Paper Session 2LLM4Code at 214 Chair(s): Prem Devanbu University of California at Davis

11:00 60m Keynote		Keynote 2: Towards Autonomous Language Model Systems (zoom talk) LLM4Code Ofir Press Princeton University
12:00 10m Talk		With a Little Help from My (LLM) Friends: Enhancing Static Analysis with LLMs to Detect Software Vulnerabilities LLM4Code Amy Munson University of California, San Diego, Juanita Gomez University of California, Santa Cruz, Álvaro Cárdenas University of California, Santa Cruz
12:10 10m Talk		Automating the Detection of Code Vulnerabilities by Analyzing GitHub Issues LLM4Code Daniele Cipollone Delft University of Technology, Changjie Wang KTH Royal Institute of Technology, Mariano Scazzariello RISE Research Institutes of Sweden, Simone Ferlin Red Hat, Maliheh Izadi Delft University of Technology, Dejan Kostic KTH Royal Institute of Technology, Marco Chiesa KTH Royal Institute of Technology
12:20 10m Talk		COSMosFL: Ensemble of Small Language Models for Fault Localisation LLM4Code Hyunjoon Cho KAIST, Sungmin Kang KAIST, Gabin An KAIST, Shin Yoo KAIST Pre-print

Keynote 2: Towards Autonomous Language Model Systems (zoom talk)

Sat 3 May
Displayed time zone: Eastern Time (US & Canada) change

Ofir Press

Princeton University

Tracks

Co-hosted Conferences

Workshops

Keynote 2: Towards Autonomous Language Model Systems (zoom talk)

Program Display Configuration

Program Display Configuration

Sat 3 MayDisplayed time zone: Eastern Time (US & Canada) change

Ofir Press

Princeton University

Sat 3 May
Displayed time zone: Eastern Time (US & Canada) change