Write a Blog >>
ASE 2021
Sun 14 - Sat 20 November 2021 Australia
Tue 16 Nov 2021 23:15 - 23:18 at Kangaroo - Artefacts Plenary (Any Day Band 2) Chair(s): Aldeida Aleti, Tim Menzies

Question Answering (QA) is an attractive and challenging area in NLP community. There are diverse algorithms being proposed and various benchmark datasets with different topics and task formats being constructed. QA software has also been widely used in daily human life now. However, current QA software is mainly tested in a reference-based paradigm, in which the expected outputs (labels) of test cases need to be annotated with much human effort before testing. As a result, neither the just-in-time test during usage nor the extensible test on massive unlabeled real-life data is feasible, which keeps the current testing of QA software from being flexible and sufficient. In this paper, we propose a method, QAAskeR, with three novel Metamorphic Relations for testing QA software. QAAskeR does not require the annotated labels but tests QA software by checking its behaviors on multiple recursively asked questions that are related to the same knowledge. Experimental results show that QAAskeR can reveal violations at over 80% of valid cases without using any pre-annotated labels. Diverse answering issues, especially the limited generalization on question types across datasets, are revealed on a state-of-the-art QA algorithm.

Tue 16 Nov

Displayed time zone: Hobart change

23:00 - 00:00
Artefacts Plenary (Any Day Band 2)Artifact Evaluation at Kangaroo
Chair(s): Aldeida Aleti Monash University, Tim Menzies North Carolina State University
23:00
5m
Day opening
Opening
Artifact Evaluation
A: Aldeida Aleti Monash University
23:05
7m
Keynote
Keynote
Artifact Evaluation
Dirk Beyer LMU Munich, Germany
23:12
3m
Talk
CiFi: Versatile Analysis of Class and Field ImmutabilityReusableAvailable
Artifact Evaluation
Tobias Roth Technische Universität Darmstadt, Dominik Helm Technische Universität Darmstadt, Michael Reif Technische Universität Darmstadt, Mira Mezini Technische Universität Darmstadt
23:15
3m
Talk
Testing Your Question Answering Software via Asking RecursivelyReusableAvailable
Artifact Evaluation
Songqiang Chen School of Computer Science, Wuhan University, Shuo Jin School of Computer Science, Wuhan University, Xiaoyuan Xie School of Computer Science, Wuhan University, China
23:18
3m
Talk
Restoring the Executability of Jupyter Notebooks by Automatic Upgrade of Deprecated APIsReusableAvailable
Artifact Evaluation
Chenguang Zhu University of Texas at Austin, Ripon Saha Fujitsu Laboratories of America, Inc., Mukul Prasad Fujitsu Research of America, Sarfraz Khurshid The University of Texas at Austin
23:21
3m
Talk
Context Debloating for Object-Sensitive Pointer AnalysisReusableAvailable
Artifact Evaluation
Dongjie He UNSW Sydney, Jingbo Lu UNSW Sydney, Jingling Xue UNSW Sydney
23:24
3m
Talk
Understanding and Detecting Performance Bugs in Markdown CompilersReusableAvailable
Artifact Evaluation
Penghui Li The Chinese University of Hong Kong, Yinxi Liu The Chinese University of Hong Kong, Wei Meng Chinese University of Hong Kong
23:27
5m
Product release
Reuse graphs
Artifact Evaluation
P: Tim Menzies North Carolina State University
23:32
10m
Talk
Most reused artefacts
Artifact Evaluation

23:42
18m
Live Q&A
Discussion
Artifact Evaluation