Finding Trojan Triggers in Code LLMs: An Occlusion-based Human-in-the-loop Approach (CAIN 2025 - Posters)

Who

Aftab Hussain, Rafiqul Rabin, Toufique Ahmed, Amin Alipour, Bowen Xu, Stephen Huang

Track

CAIN 2025 Posters

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 10:06 - 10:09 at 208 - Lightning talks Chair(s): Scott Barnett

Abstract

Large language models (LLMs) are becoming an integrated part of software development. These models are trained on large datasets for code, where it is hard to verify each data point. Therefore, a potential attack surface can be to inject poisonous data into the training data to make models vulnerable, aka trojaned. It can pose a significant threat by hiding manipulative behaviors inside models, leading to compromising the integrity of the models in downstream tasks. In this paper, we propose an occlusion-based human-in-the-loop technique, OSeql, to distinguish trojan-triggering inputs of code. The technique is based on the observation that trojaned neural models of code rely heavily on the triggering part of input; hence, its removal would change the confidence of the models in their prediction substantially. Our results suggest that OSeql can detect the triggering inputs with almost 100 recall and F1-scores of around 0.7 or above.

Link to Preprint

https://arxiv.org/abs/2312.04004

Aftab Hussain

Texas A&M University, College Station

Rafiqul Rabin

UL Research Institutes

United States

Toufique Ahmed

IBM Research

United States

Amin Alipour

University of Houston

United States

Bowen Xu

North Carolina State University

United States

Stephen Huang

University of Houston

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 28 Apr
Displayed time zone: Eastern Time (US & Canada) change

10:00 - 10:30	Lightning talksPosters / Research and Experience Papers at 208 Chair(s): Scott Barnett Deakin University, Australia

10:00 3m Poster		All You Need is an AI Platform: A Proposal for a Complete Reference Architecture Posters Benjamin Weigell University of Augsburg, Fabian Stieler University of Augsburg, Bernhard Bauer University of Augsburg
10:03 3m Poster		Evaluating Reinforcement Learning Safety and Trustworthiness in Cyber-Physical Systems Posters Katherine R. Dearstyne University of Notre Dame, Pedro Alarcon Granadeno University of Notre Dame, Theodore Chambers University of Notre Dame, Jane Cleland-Huang University of Notre Dame
10:06 3m Poster		Finding Trojan Triggers in Code LLMs: An Occlusion-based Human-in-the-loop Approach Posters Aftab Hussain Texas A&M University, College Station, Rafiqul Rabin UL Research Institutes, Toufique Ahmed IBM Research, Amin Alipour University of Houston, Bowen Xu North Carolina State University, Stephen Huang University of Houston Pre-print
10:09 3m Poster		Navigating the Shift: Architectural Transformations and Emerging Verification Demands in AI-Enabled Cyber-Physical Systems Posters Hadiza Yusuf University of Michigan - Dearborn, Khouloud Gaaloul University of Michigan - Dearborn
10:12 3m Poster		Random Perturbation Attacks on LLMs for Code Generation Posters Qiulu Peng Carnegie Mellon University, Chi Zhang , Ravi Mangal Colorado State University, Corina S. Păsăreanu Carnegie Mellon University; NASA Ames, Limin Jia Carnegie Mellon University
10:15 3m Poster		Safeguarding LLM-Applications: Specify or Train? Posters Hala Abdelkader Applied Artificial Intelligence Institute, Deakin University, Mohamed Abdelrazek Deakin University, Australia, Sankhya Singh Deakin University, Irini Logothetis Applied Artificial Intelligence Institute, Deakin University, Priya Rani RMIT University, Rajesh Vasa Deakin University, Australia, Jean-Guy Schneider Monash University
10:18 3m Poster		Task decomposition and RAG as Design Patterns for LLM-based Systems Posters Orlando Marquez Ayala ServiceNow