Token Sugar: Making Source Code Sweeter for LLMs through Token-Efficient Shorthand
This program is tentative and subject to change.
Large language models (LLMs) have shown exceptional performance in code generation and understanding tasks, yet their high computational costs hinder broader adoption. One important factor is the inherent verbosity of programming languages, such as unnecessary formatting elements and lengthy boilerplate code. This leads to inflated token counts in both input and generated outputs, which increases inference costs and slows down the generation process. Prior work improves this through simplifying programming language grammars, reducing token usage across both code understanding and generation tasks. However, it is confined to syntactic transformations, leaving significant opportunities for token reduction unrealized at the semantic level.
In this work, we propose \textit{Token Sugar}, a novel concept that replaces frequent and verbose code patterns with reversible, token-efficient shorthand in the source code. To realize this concept in practice, we designed a systematic solution that mines high-frequency, token-heavy patterns from a code corpus, maps each to a unique shorthand, and integrates them into LLM pretraining via code transformation. With this solution, we obtain 799 (code pattern, shorthand) pairs, which can reduce up to 15.1% token count in the source code and is complementary to existing syntax-focused methods. We further trained three widely used LLMs on Token Sugar-augmented data. Experimental results show that these models not only achieve significant token savings (up to 11.2% reduction) during generation but also maintain near-identical Pass@1 scores compared to baselines trained on unprocessed code.
This program is tentative and subject to change.
Mon 17 NovDisplayed time zone: Seoul change
14:00 - 15:30 | |||
14:00 10mTalk | QuanBench: Benchmarking Quantum Code Generation with Large Language Models Research Papers | ||
14:10 10mTalk | Token Sugar: Making Source Code Sweeter for LLMs through Token-Efficient Shorthand Research Papers Zhensu Sun Singapore Management University, Chengran Yang Singapore Management University, Singapore, Xiaoning Du Monash University, Zhou Yang University of Alberta, Alberta Machine Intelligence Institute , Li Li Beihang University, David Lo Singapore Management University | ||
14:20 10mTalk | FGIT: Fault-Guided Fine-Tuning for Code Generation Research Papers Lishui Fan Zhejiang University, Zhongxin Liu Zhejiang University, Haoye Wang Hangzhou City University, Lingfeng Bao Zhejiang University, Xin Xia Zhejiang University, Shanping Li Zhejiang University | ||
14:30 10mTalk | Mixture-of-Experts Low-Rank Adaptation for Multilingual Code Summarization Research Papers Tianchen Yu School of Software Engineering, South China University of Technology, Li Yuan School of Software Engineering, South China University of Technology, Guangzhou, China, Hailin Huang South China University of Technology, Jiexin Wang South China University of Technology, Yi Cai School of Software Engineering, South China University of Technology, Guangzhou, China | ||
14:40 10mTalk | EfficientEdit: Accelerating Code Editing via Edit-Oriented Speculative Decoding Research Papers Peiding Wang Beihang university, Li Zhang Beihang University, Fang Liu Beihang University, Yinghao Zhu Beihang University, Wang Xu Tsinghua University, Lin Shi Beihang University, Xiaoli Lian Beihang University, China, Minxiao Li Beihang university, Bo Shen Huawei Cloud Computing Technologies Co., Ltd., Binzhang Fu Huawei Technologies, n.n. Pre-print | ||
14:50 10mTalk | Bias Testing and Mitigation in LLM-based Code Generation Journal-First Track Dong Huang The University of Hong Kong, Jie M. Zhang King's College London, Qingwen Bu Shanghai Jiao Tong University, Xiaofei Xie Singapore Management University, Junjie Chen Tianjin University, Heming Cui University of Hong Kong | ||
15:00 10mTalk | FastCoder: Accelerating Repository-level Code Generation via Efficient Retrieval and Verification Research Papers Qianhui Zhao Beihang University, Li Zhang Beihang University, Fang Liu Beihang University, Xiaoli Lian Beihang University, China, Meng Qiaoyuanhe Beihang University, Ziqian Jiao Beihang University, Zetong Zhou Beihang University, Jia Li , Lin Shi Beihang University Pre-print | ||
15:10 10mTalk | AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion Research Papers Tianyue Jiang Sun Yat-sen University, Yanli Wang Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Daya Guo , Ensheng Shi Huawei, Yuchi Ma Huawei Cloud Computing Technologies, Jiachi Chen Sun Yat-sen University, Zibin Zheng Sun Yat-sen University | ||
15:20 10mTalk | Effectiveness of symmetric metamorphic relations on validating the stability of code generation LLM Journal-First Track Chan Pak Yuen Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China, Jacky Keung City University of Hong Kong, Zhen Yang Shandong University | ||