Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute (ASE 2025 - Industry Showcase)

Who

Yingwei Ma, Yongbin Li, Yihong Dong, Xue Jiang, Yanhao Li, Yue Liu, Rongyu Cao, Jue Chen, Fei Huang, Binhua Li

Track

ASE 2025 Industry Showcase

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 18 Nov 2025 16:50 - 17:00 at Grand Hall 6 - Efficiency & Fairness 2

Abstract

Recent advancements in software engineering agents have demonstrated promising capabilities in automating program improvements. However, their reliance on closed-source or resource-intensive models introduces significant deployment challenges in private environments, prompting a critical question: \textit{How can personally deployable open-source LLMs (e.g., 32B models running on a single GPU) achieve comparable code reasoning performance?} To this end, we propose a unified Test-Time Compute (TTC) scaling framework that leverages increased inference-time computation instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. Internally, we introduce a \textit{development-contextualized trajectory synthesis} method leveraging real-world software repositories to bootstrap multi-stage reasoning processes, such as fault localization and patch generation. We further enhance trajectory quality through rejection sampling, rigorously evaluating trajectories along accuracy and complexity. Externally, we propose a novel \textit{development-process-based search} strategy guided by reward models and execution verification. This approach enables targeted computational allocation at critical development decision points, overcoming limitations of existing “end-point only” verification methods.

Evaluations on SWE-bench Verified demonstrate our \textbf{32B model achieves a 46% issue resolution rate}, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1. Additionally, we provide the empirical validation of the test-time scaling phenomenon within SWE agents, revealing that \textbf{models dynamically allocate more tokens to increasingly challenging problems}, effectively enhancing reasoning capabilities. We publicly release all training data, models, and code to facilitate future research.\footnote{Model: \url{https://github.com/yingweima2022/SWE-Reasoner/tree/6627eba7215425ecfef65a40a9c516b2feca1bc7}, Code: \url{https://github.com/yingweima2022/AnonymousSWESynInferpro}}. \textit{In fact, our method has been deployed in Tongyi Lingma, an IDE-based coding assistant developed by Alibaba Cloud, where it helps developers solve real-world programming problems.}

Yingwei Ma

Tongyi Lab, Alibaba

Yongbin Li

Tongyi Lab, Alibaba, China

Yihong Dong

Peking University

China

Xue Jiang

Yanhao Li

Tongyi Lab, Alibaba

Yue Liu

Monash University

Rongyu Cao

Tongyi Lab, Alibaba, China

Jue Chen

Tongyi Lab, Alibaba, China

Fei Huang

Tongyi Lab, Alibaba, China

Binhua Li

Tongyi Lab, Alibaba, China

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 18 Nov
Displayed time zone: Seoul change

16:00 - 17:00	Efficiency & Fairness 2Industry Showcase / NIER Track at Grand Hall 6

16:00 10m Talk		Adaptive Performance Regression Detection via Semi-Supervised Siamese Learning Industry Showcase Yongqian Sun Nankai University, Mengyao Li Nankai University, Xiao Xiong Nankai University, Lei Tao Nankai University, Yimin Zuo Nankai University, Wenwei Gu The Chinese University of Hong Kong, Shenglin Zhang Nankai University, Junhua Kuang Nankai University, Yu Luo Nankai University, Huandong Zhuang Huawei Cloud, Bowen Deng Huawei Cloud, Dan Pei Tsinghua University
16:10 10m Talk		Deploying Language Models on Android-Based Edge Devices: A Practical Evaluation Pipeline Industry Showcase Suayder Costa Venturus - Innovation & Technology, Igor Lima Venturus - Innovation & Technology, William Harada Venturus - Innovation & Technology, Mateus Lucena Venturus - Innovation & Technology, Arthur Alves Venturus - Innovation & Technology, Ruan Belem TPV Technology, Agemilson Pimentel TPV Technology, Rômulo Fabrício TPV Technology, Alexandre Miranda Paulo Feitoza Foundation- FPFTech, Daniel Lins Venturus - Innovation & Technology, Frederico Goncalves Venturus - Innovation & Technology, Sidney Leal Venturus - Innovation & Technology
16:20 10m Talk		How Can Infrastructure as Code Accelerate Data Center Bring-ups? A Case Study at ByteDance Industry Showcase Xianhao Jin ByteDance, Yifei Feng ByteDance, Yufei Gao ByteDance, Yongning Hu ByteDance, Jie Huang ByteDance, Kun Xia ByteDance, Luchuan Guo ByteDance
16:30 10m Talk		MobileUPReg: Identifying User-Perceived Performance Regressions in Mobile OS Versions Industry Showcase Wei Liu Concordia University, Montreal, Canada, Yi Wen HENG Concordia University, Feng Lin Concordia University, Tse-Hsun (Peter) Chen Concordia University, Ahmed E. Hassan Queen’s University
16:40 10m Talk		Context-Aware CodeLLM Eviction for AI-assisted Coding Industry Showcase Kishanthan Thangarajah Centre for Software Excellence, Huawei Canada, Boyuan Chen Centre for Software Excellence, Huawei Canada, Shi Chang University of Western Ontario, Ahmed E. Hassan Queen’s University
16:50 10m Talk		Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute Industry Showcase Yingwei Ma Tongyi Lab, Alibaba, Yongbin Li Tongyi Lab, Alibaba, China, Yihong Dong Peking University, Xue Jiang , Yanhao Li Tongyi Lab, Alibaba, Yue Liu Monash University, Rongyu Cao Tongyi Lab, Alibaba, China, Jue Chen Tongyi Lab, Alibaba, China, Fei Huang Tongyi Lab, Alibaba, China, Binhua Li Tongyi Lab, Alibaba, China