TensorGuard: Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification
This program is tentative and subject to change.
As Large Language Models (LLMs) become integral software components in modern applications, unauthorized model derivations through fine-tuning, merging, and redistribution have emerged as critical software engineering challenges. Unlike traditional software where clone detection and license compliance are well-established, the LLM ecosystem lacks effective mechanisms to detect model lineage and enforce licensing agreements. This gap is particularly problematic when open-source model creators, such as Meta’s LLaMA, require derivative works to maintain naming conventions for attribution, yet no technical means exist to verify compliance.
To fill this gap, treating LLMs as software artifacts requiring provenance tracking, we present TensorGuard, a gradient-based fingerprinting framework for LLM similarity detection and family classification. Our approach extracts model-intrinsic behavioral signatures by analyzing gradient responses to random input perturbations across tensor layers, operating independently of training data, watermarks, or specific model formats. TensorGuard supports the widely-adopted safetensors format and constructs high-dimensional fingerprints through statistical analysis of gradient features. These fingerprints enable two complementary capabilities: direct pairwise similarity assessment between arbitrary models through distance computation, and systematic family classification of unknown models via the K-Means clustering algorithm with domain-informed centroid initialization using known base models. Experimental evaluation on 58 models comprising 8 base models and 50 derivatives across five model families (Llama, Qwen, Gemma, Phi, Mistral) demonstrates 94% classification accuracy under our centroid-initialized K-Means clustering. Our work establishes a new paradigm for model similarity detection, bridging traditional software engineering practices with modern LLM distribution and compliance challenges.
This program is tentative and subject to change.
Mon 17 NovDisplayed time zone: Seoul change
11:00 - 12:30 | |||
11:00 10mTalk | TensorGuard: Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification Research Papers Zehao Wu Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology | ||
11:10 10mTalk | Root Cause Analysis of RISC-V Build Failures via LLM and MCTS Reasoning Research Papers Weipeng Shuai Institute of Software, Chinese Academy of Sciences, Jie Liu Institute of Software, Chinese Academy of Sciences, Zhirou Ma Institute of Software, Chinese Academy of Sciences, Liangyi Kang Institute of Software, Chinese Academy of Sciences, Zehua Wang Institute of Software, Chinese Academy of Sciences, Shuai Wang Institute of Software, Chinese Academy of Sciences, Dan Ye Institute of Software at Chinese Academy of Sciences, Hui Li , Wei Wang Institute of Software at Chinese Academy of Sciences, Jiaxin Zhu Institute of Software at Chinese Academy of Sciences | ||
11:20 10mTalk | An Empirical Study of Knowledge Transfer in AI Pair Programming Research Papers Alisa Carla Welter Saarland University, Niklas Schneider Saarland University, Tobias Dick Saarland University, Kallistos Weis Saarland University, Christof Tinnes Saarland University, Marvin Wyrich Saarland University, Sven Apel Saarland University | ||
11:30 10mTalk | Efficient Understanding of Machine Learning Model Mispredictions Research Papers Martin Eberlein Humboldt-Universtität zu Berlin, Jürgen Cito TU Wien, Lars Grunske Humboldt-Universität zu Berlin | ||
11:40 10mTalk | Can Mamba Be Better? An Experimental Evaluation of Mamba in Code Intelligence Research Papers Shuo Liu City University of Hong Kong, Jacky Keung City University of Hong Kong, Zhen Yang Shandong University, Zhenyu Mao City University of Hong Kong, Yicheng Sun City University of Hong Kong | ||
11:50 10mTalk | "My productivity is boosted, but ..." Demystifying Users’ Perception on AI Coding Assistants Research Papers | ||
12:00 10mTalk | HFUZZER: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing Research Papers Yukai Zhao , Menghan Wu Zhejiang University, Xing Hu Zhejiang University, Xin Xia Zhejiang University | ||
12:10 10mTalk | Provable Fairness Repair for Deep Neural Networks Research Papers Jianan Ma Hangzhou Dianzi University, China; Zhejiang University, Hangzhou, China, Jingyi Wang Zhejiang University, Qi Xuan Zhejiang University of Technology; Binjiang Institute of Artificial Intelligence, Zhen Wang Hangzhou Dianzi University, China | ||
12:20 10mTalk | AutoAdapt: On the Application of AutoML for Parameter-Efficient Fine-Tuning of Pre-Trained Code Models Journal-First Track Amal Akli University of Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Mike Papadakis University of Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg | ||