Write a Blog >>
ICSE 2023
Sun 14 - Sat 20 May 2023 Melbourne, Australia
Mon 15 May 2023 15:45 - 16:35 at Meeting Room 209 - Session 3

OpenAI’s ChatGPT, a generative language model, has attracted widespread attention from industry, academia, and the public for its impressive natural language processing capabilities. Although we know how to train such generative language models, we do not know how these models can solve such a diverse range of open-ended tasks. Every time we “prompt program” a large language model to complete a task, we create a customized version of the language model, which exhibits different abilities and outputs than other customized versions. Some people believe that the emergent capabilities of large language models are turning AI from engineering into natural science, as it is hard to think of these models as being designed for a specific purpose in the traditional sense. As our focus shifts from ensuring design and construction correctness to trying to explore and understand un-designed AI products and behaviors, we need to consider the methodological challenges posed by this transformation. For example, will differential testing, metamorphic testing, and adversarial testing, which are effective for testing discriminative models in specific tasks, no longer be the saviors of open-ended task testing for large language models? How can we test and correct ethical issues and hallucinations in generative AI? Due to the emergent capabilities of large language models, which are customized through in-context learning, will we face similar problems to the Schrödinger’s cat problem in quantum physics? If observation and measurement have a fundamental impact on the observed object, can we still fully test the essence of large language models, or can we only test the appearances of a specific customized version? Large language models are changing the way humans interact with AI, what adjustments do we need to make to our existing data and algorithm-centric MLOps? There may be many unknown problems. In this talk, I will share my thoughts (or even confusion) on these questions and some thoughts of actions (likely be wrong), hoping to inspire the community to explore the feasibility and methodology of testing generative large language models.

Mon 15 May

Displayed time zone: Hobart change

15:45 - 17:15
15:45
50m
Keynote
Testing Generative Large Language Model: Mission Impossible or Where Lies the Path?
DeepTest
Zhenchang Xing CSIRO’s Data61; Australian National University
16:35
30m
Panel
Panel
DeepTest

17:05
10m
Day closing
Closing
DeepTest