Context-Aware CodeLLM Eviction for AI-assisted Coding (ASE 2025 - Industry Showcase) - ASE 2025

Sun 16 - Thu 20 November 2025 Seoul, South Korea

Who

Kishanthan Thangarajah, Boyuan Chen, Shi Chang, Ahmed E. Hassan

Track

ASE 2025 Industry Showcase

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Tue 18 Nov 2025 16:40 - 16:50 at Grand Hall 6 - Efficiency & Fairness 2

Abstract

AI-assisted coding tools powered by Code Large Language Models (CodeLLMs) are increasingly integrated into modern software development workflows. To address concerns around privacy, latency, and model customization, many enterprises opt to self-host these models. However, the diversity and growing number of CodeLLMs, coupled with limited accelerator memory, introduce practical challenges in model management and serving efficiency. This paper presents CACE, a novel context-aware model eviction strategy designed specifically to optimize self-hosted CodeLLM serving under resource constraints. Unlike traditional eviction strategies based solely on recency (e.g., Least Recently Used), CACE leverages multiple context-aware factors, including model load time, task-specific latency sensitivity, expected output length, and recent usage and future demand tracked through a sliding window. We evaluate CACE using realistic workloads that include both latency-sensitive code completion and throughput-intensive code reasoning tasks. Our experiments show that CACE reduces Time-to-First-Token (TTFT) by 70% and end-to-end (E2E) latency by 37%, while significantly lowering the number of model evictions by 55% compared to state-of-the-art systems. Ablation studies further demonstrate the importance of multi-factor eviction in balancing responsiveness and resource efficiency. This work contributes practical strategies for deploying scalable, low-latency AI coding assistants in real-world software engineering environments.

Kishanthan Thangarajah

Centre for Software Excellence, Huawei Canada

Boyuan Chen

Centre for Software Excellence, Huawei Canada

Shi Chang

University of Western Ontario

Ahmed E. Hassan

Queen’s University

Canada

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Tue 18 Nov
Displayed time zone: Seoul change

	16:00 - 17:00	Efficiency & Fairness 2Industry Showcase / NIER Track at Grand Hall 6

	16:00 10m Talk		Adaptive Performance Regression Detection via Semi-Supervised Siamese Learning Industry Showcase Yongqian Sun Nankai University, Mengyao Li Nankai University, Xiao Xiong Nankai University, Lei Tao Nankai University, Yimin Zuo Nankai University, Wenwei Gu The Chinese University of Hong Kong, Shenglin Zhang Nankai University, Junhua Kuang Nankai University, Yu Luo Nankai University, Huandong Zhuang Huawei Cloud, Bowen Deng Huawei Cloud, Dan Pei Tsinghua University
	16:10 10m Talk		Deploying Language Models on Android-Based Edge Devices: A Practical Evaluation Pipeline Industry Showcase Suayder Costa Venturus - Innovation & Technology, Igor Lima Venturus - Innovation & Technology, William Harada Venturus - Innovation & Technology, Mateus Lucena Venturus - Innovation & Technology, Arthur Alves Venturus - Innovation & Technology, Ruan Belem TPV Technology, Agemilson Pimentel TPV Technology, Rômulo Fabrício TPV Technology, Alexandre Miranda Paulo Feitoza Foundation- FPFTech, Daniel Lins Venturus - Innovation & Technology, Frederico Goncalves Venturus - Innovation & Technology, Sidney Leal Venturus - Innovation & Technology
	16:20 10m Talk		How Can Infrastructure as Code Accelerate Data Center Bring-ups? A Case Study at ByteDance Industry Showcase Xianhao Jin ByteDance, Yifei Feng ByteDance, Yufei Gao ByteDance, Yongning Hu ByteDance, Jie Huang ByteDance, Kun Xia ByteDance, Luchuan Guo ByteDance
	16:30 10m Talk		MobileUPReg: Identifying User-Perceived Performance Regressions in Mobile OS Versions Industry Showcase Wei Liu Concordia University, Montreal, Canada, Yi Wen HENG Concordia University, Feng Lin Concordia University, Tse-Hsun (Peter) Chen Concordia University, Ahmed E. Hassan Queen’s University
	16:40 10m Talk		Context-Aware CodeLLM Eviction for AI-assisted Coding Industry Showcase Kishanthan Thangarajah Centre for Software Excellence, Huawei Canada, Boyuan Chen Centre for Software Excellence, Huawei Canada, Shi Chang University of Western Ontario, Ahmed E. Hassan Queen’s University
	16:50 10m Talk		Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute Industry Showcase Yingwei Ma Tongyi Lab, Alibaba, Yongbin Li Tongyi Lab, Alibaba, China, Yihong Dong Peking University, Xue Jiang , Yanhao Li Tongyi Lab, Alibaba, Yue Liu Monash University, Rongyu Cao Tongyi Lab, Alibaba, China, Jue Chen Tongyi Lab, Alibaba, China, Fei Huang Tongyi Lab, Alibaba, China, Binhua Li Tongyi Lab, Alibaba, China