Testing Your Question Answering Software via Asking Recursively
Question Answering (QA) is an attractive and challenging area in NLP community. There are diverse algorithms being proposed and various benchmark datasets with different topics and task formats being constructed. QA software has also been widely used in daily human life now. However, current QA software is mainly tested in a reference-based paradigm, in which the expected outputs (labels) of test cases need to be annotated with much human effort before testing. As a result, neither the just-in-time test during usage nor the extensible test on massive unlabeled real-life data is feasible, which keeps the current testing of QA software from being flexible and sufficient. In this paper, we propose a method, QAAskeR, with three novel Metamorphic Relations for testing QA software. QAAskeR does not require the annotated labels but tests QA software by checking its behaviors on multiple recursively asked questions that are related to the same knowledge. Experimental results show that QAAskeR can reveal violations at over 80% of valid cases without using any pre-annotated labels. Diverse answering issues, especially the limited generalization on question types across datasets, are revealed on a state-of-the-art QA algorithm.
Tue 16 NovDisplayed time zone: Hobart change
18:00 - 19:00 | Testing IResearch Papers / NIER track / Industry Showcase at Kangaroo Chair(s): Xiaoyin Wang University of Texas at San Antonio | ||
18:00 20mTalk | Testing Your Question Answering Software via Asking Recursively Research Papers Songqiang Chen School of Computer Science, Wuhan University, Shuo Jin School of Computer Science, Wuhan University, Xiaoyuan Xie School of Computer Science, Wuhan University, China | ||
18:20 20mTalk | Improving Test Case Generation for REST APIs Through Hierarchical Clustering Research Papers Dimitri Stallenberg Delft University of Technology, Mitchell Olsthoorn Delft University of Technology, Annibale Panichella Delft University of Technology DOI Pre-print | ||
18:40 10mTalk | Access Control Tree for Testing and Learning Industry Showcase Davrondzhon Gafurov Norsk Helsenett SF, Margrete Sunde Grovan Norsk Helsenett SF, Margrete Sunde Grovan Norsk Helsenett SF | ||
18:50 10mTalk | Property-based Test for Part-of-Speech Tagging Tool NIER track Shuo Jin School of Computer Science, Wuhan University, Songqiang Chen School of Computer Science, Wuhan University, Xiaoyuan Xie School of Computer Science, Wuhan University, China |