Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers
Machine learning (ML) for text classification has been widely used in various domains, such as toxicity detection, chatbot consulting, and review analysis. These applications can significantly impact ethics, economics, and human behavior, raising serious concerns about trusting ML decisions. Several studies indicate that traditional uncertainty metrics, such as model confidence, and performance metrics, like accuracy, are insufficient to build human trust in ML models. These models often learn spurious correlations during training and predict based on them during inference. When deployed in the real world, where such correlations are absent, their performance can deteriorate significantly. To avoid this, a common practice is to test whether predictions are made reasonably based on valid patterns in the data, Along with this, a challenge known as the trustworthiness oracle problem has been introduced. So far, due to the lack of automated trustworthiness oracles, the assessment requires manual validation, based on the decision process disclosed by explanation methods. However, this approach is time-consuming, error-prone, and not scalable.
To address this problem, we propose TOKI, the first automated trustworthiness oracle generation method for text classifiers. TOKI automatically checks whether the words contributing the most to a prediction are semantically related to the predicted class. Specifically, we leverage ML explanation methods to extract the decision-contributing words and measure their semantic relatedness with the class based on word embeddings. As a demonstration of its practical usefulness, we also introduce a novel adversarial attack method that targets trustworthiness vulnerabilities identified by TOKI. We compare TOKI with a naive baseline based solely on model confidence. To evaluate their alignment with human judgement, experiments are conducted on human-created ground truths of approximately 6,000 predictions. Additionally, we compare the effectiveness of TOKI-guided adversarial attack method with A2T, a state-of-the-art adversarial attack method for text classification. Results show that (1) relying on prediction uncertainty metrics, such as model confidence, cannot effectively distinguish between trustworthy and untrustworthy predictions, (2) TOKI achieves 142% higher accuracy than the naive baseline, and (3) TOKI-guided adversarial attack method is more effective with fewer perturbations than A2T.
Presentation (FSE_SEandAI2_1120_LamNguyenTung_Automated.pptx) | 9.79MiB |
Wed 25 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
11:00 - 12:30 | SE and AI 2Ideas, Visions and Reflections / Research Papers at Cosmos Hall Chair(s): Massimiliano Di Penta University of Sannio, Italy | ||
11:00 20mTalk | Beyond PEFT: Layer-Wise Optimization for More Effective and Efficient Large Code Model Tuning Research Papers Chaozheng Wang The Chinese University of Hong Kong, jiafeng University of Electronic Science and Technology of China, Shuzheng Gao Chinese University of Hong Kong, Cuiyun Gao Harbin Institute of Technology, Shenzhen, Li Zongjie Hong Kong University of Science and Technology, Ting Peng Tencent Inc., Hailiang Huang Tencent Inc., Yuetang Deng Tencent, Michael Lyu Chinese University of Hong Kong DOI | ||
11:20 20mTalk | Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers Research Papers Lam Nguyen Tung Monash University, Australia, Steven Cho The University of Auckland, New Zealand, Xiaoning Du Monash University, Neelofar Neelofar Royal Melbourne Institure of Techonlogy (RMIT), Valerio Terragni University of Auckland, Stefano Ruberto JRC European Commission, Aldeida Aleti Monash University DOI Media Attached File Attached | ||
11:40 20mTalk | A Causal Learning Framework for Enhancing Robustness of Source Code Models Research Papers Junyao Ye Huazhong University of Science and Technology, Zhen Li Huazhong University of Science and Technology, Xi Tang Huazhong University of Science and Technology, Deqing Zou Huazhong University of Science and Technology, Shouhuai Xu University of Colorado Colorado Springs, Qiang Weizhong Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology DOI | ||
12:00 20mTalk | Eliminating Backdoors in Neural Code Models for Secure Code Understanding Research Papers Weisong Sun Nanjing University, Yuchen Chen Nanjing University, Chunrong Fang Nanjing University, Yebo Feng Nanyang Technological University, Yuan Xiao Nanjing University, An Guo Nanjing University, Quanjun Zhang School of Computer Science and Engineering, Nanjing University of Science and Technology, Zhenyu Chen Nanjing University, Baowen Xu Nanjing University, Yang Liu Nanyang Technological University DOI | ||
12:20 10mTalk | Reduction Fusion for Optimized Distributed Data-Parallel Computations via Inverse Recomputation Ideas, Visions and Reflections Haoxiang Lin Microsoft Research, Yang Wang Microsoft Research Asia, Yanjie Gao Microsoft Research, Hongyu Zhang Chongqing University, Ming Wu Zero Gravity Labs, Mao Yang Microsoft Research DOI Pre-print |
This is the main event hall of Clarion Hotel, which will be used to host keynote talks and other plenary sessions. The FSE and ISSTA banquets will also happen in this room.
The room is just in front of the registration desk, on the other side of the main conference area. The large doors with numbers “1” and “2” provide access to the Cosmos Hall.