Testing Your Question Answering Software via Asking Recursively (ASE 2021 - Artifact Evaluation)

Who

Songqiang Chen, Shuo Jin, Xiaoyuan Xie

Track

ASE 2021 Artifact Evaluation

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 16 Nov 2021 23:15 - 23:18 at Kangaroo - Artefacts Plenary (Any Day Band 2) Chair(s): Aldeida Aleti, Tim Menzies

Abstract

Question Answering (QA) is an attractive and challenging area in NLP community. There are diverse algorithms being proposed and various benchmark datasets with different topics and task formats being constructed. QA software has also been widely used in daily human life now. However, current QA software is mainly tested in a reference-based paradigm, in which the expected outputs (labels) of test cases need to be annotated with much human effort before testing. As a result, neither the just-in-time test during usage nor the extensible test on massive unlabeled real-life data is feasible, which keeps the current testing of QA software from being flexible and sufficient. In this paper, we propose a method, QAAskeR, with three novel Metamorphic Relations for testing QA software. QAAskeR does not require the annotated labels but tests QA software by checking its behaviors on multiple recursively asked questions that are related to the same knowledge. Experimental results show that QAAskeR can reveal violations at over 80% of valid cases without using any pre-annotated labels. Diverse answering issues, especially the limited generalization on question types across datasets, are revealed on a state-of-the-art QA algorithm.

Songqiang Chen

School of Computer Science, Wuhan University

China

Shuo Jin

School of Computer Science, Wuhan University

China

Xiaoyuan Xie

School of Computer Science, Wuhan University, China

China

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 16 Nov
Displayed time zone: Hobart change

23:00 - 00:00	Artefacts Plenary (Any Day Band 2)Artifact Evaluation at Kangaroo Chair(s): Aldeida Aleti Monash University, Tim Menzies North Carolina State University

23:00 5m Day opening		Opening Artifact Evaluation A: Aldeida Aleti Monash University
23:05 7m Keynote		Keynote Artifact Evaluation Dirk Beyer LMU Munich, Germany
23:12 3m Talk		CiFi: Versatile Analysis of Class and Field Immutability Artifact Evaluation Tobias Roth Technische Universität Darmstadt, Dominik Helm Technische Universität Darmstadt, Michael Reif Technische Universität Darmstadt, Mira Mezini Technische Universität Darmstadt
23:15 3m Talk		Testing Your Question Answering Software via Asking Recursively Artifact Evaluation Songqiang Chen School of Computer Science, Wuhan University, Shuo Jin School of Computer Science, Wuhan University, Xiaoyuan Xie School of Computer Science, Wuhan University, China
23:18 3m Talk		Restoring the Executability of Jupyter Notebooks by Automatic Upgrade of Deprecated APIs Artifact Evaluation Chenguang Zhu University of Texas at Austin, Ripon Saha Fujitsu Laboratories of America, Inc., Mukul Prasad Fujitsu Research of America, Sarfraz Khurshid The University of Texas at Austin
23:21 3m Talk		Context Debloating for Object-Sensitive Pointer Analysis Artifact Evaluation Dongjie He UNSW Sydney, Jingbo Lu UNSW Sydney, Jingling Xue UNSW Sydney
23:24 3m Talk		Understanding and Detecting Performance Bugs in Markdown Compilers Artifact Evaluation Penghui Li , Yinxi Liu The Chinese University of Hong Kong, Wei Meng Chinese University of Hong Kong
23:27 5m Product release		Reuse graphs Artifact Evaluation P: Tim Menzies North Carolina State University
23:32 10m Talk		Most reused artefacts Artifact Evaluation
23:42 18m Live Q&A		Discussion Artifact Evaluation