Testing Generative Large Language Model: Mission Impossible or Where Lies the Path? (DeepTest 2023)

Track

DeepTest 2023

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 15 May 2023 15:45 - 16:35 at Meeting Room 209 - Session 3

Abstract

OpenAI’s ChatGPT, a generative language model, has attracted widespread attention from industry, academia, and the public for its impressive natural language processing capabilities. Although we know how to train such generative language models, we do not know how these models can solve such a diverse range of open-ended tasks. Every time we “prompt program” a large language model to complete a task, we create a customized version of the language model, which exhibits different abilities and outputs than other customized versions. Some people believe that the emergent capabilities of large language models are turning AI from engineering into natural science, as it is hard to think of these models as being designed for a specific purpose in the traditional sense. As our focus shifts from ensuring design and construction correctness to trying to explore and understand un-designed AI products and behaviors, we need to consider the methodological challenges posed by this transformation. For example, will differential testing, metamorphic testing, and adversarial testing, which are effective for testing discriminative models in specific tasks, no longer be the saviors of open-ended task testing for large language models? How can we test and correct ethical issues and hallucinations in generative AI? Due to the emergent capabilities of large language models, which are customized through in-context learning, will we face similar problems to the Schrödinger’s cat problem in quantum physics? If observation and measurement have a fundamental impact on the observed object, can we still fully test the essence of large language models, or can we only test the appearances of a specific customized version? Large language models are changing the way humans interact with AI, what adjustments do we need to make to our existing data and algorithm-centric MLOps? There may be many unknown problems. In this talk, I will share my thoughts (or even confusion) on these questions and some thoughts of actions (likely be wrong), hoping to inspire the community to explore the feasibility and methodology of testing generative large language models.

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 15 May
Displayed time zone: Hobart change

15:45 - 17:15	Session 3DeepTest at Meeting Room 209

15:45 50m Keynote		Testing Generative Large Language Model: Mission Impossible or Where Lies the Path? DeepTest Zhenchang Xing CSIRO’s Data61; Australian National University
16:35 30m Panel		Panel DeepTest
17:05 10m Day closing		Closing DeepTest

Testing Generative Large Language Model: Mission Impossible or Where Lies the Path?

Mon 15 May
Displayed time zone: Hobart change

Zhenchang Xing

CSIRO’s Data61; Australian National University

Tracks

Co-hosted Conferences

Workshops

Co-hosted Symposia

Testing Generative Large Language Model: Mission Impossible or Where Lies the Path?

Program Display Configuration

Program Display Configuration

Mon 15 MayDisplayed time zone: Hobart change

Zhenchang Xing

CSIRO’s Data61; Australian National University

Mon 15 May
Displayed time zone: Hobart change