HybridSIMD: A Super C++ SIMD Library with Integrated Auto-tuning Capabilities (ASE 2025 - Research Papers) - ASE 2025

Sun 16 - Thu 20 November 2025 Seoul, South Korea

Who

Haolin Pan, Xulin Zhou, Mingjie Xing, Yanjun Wu

Track

ASE 2025 Research Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Mon 17 Nov 2025 11:12 - 11:25 at Grand Hall 4 - Efficiency & Fairness 1

Abstract

Single Instruction, Multiple Data (SIMD) technology is crucial for enhancing computational efficiency in High-Performance Computing (HPC) and Artificial Intelligence (AI). While automatic vectorization methods offer ease of use, they suffer from limitations in hardware utilization due to compilers’ static analysis capabilities. Manual vectorization, on the other hand, allows for fine-grained control and potentially better hardware utilization, but manual approaches using low-level intrinsics specifically introduce challenges in portability and development complexity. Existing C++ SIMD libraries aim to address these issues but introduce new challenges such as performance and usability fragmentation and underutilization of hardware potential due to limited support for variable vector element counts. To overcome these limitations, this paper introduces HybridSIMD, a novel unified and autotunable SIMD library. HybridSIMD is designed to resolve both fragmentation and hardware underutilization by enabling operator-level hybrid collaborative optimization across different SIMD libraries through a unified interface. A built-in auto-tuning mechanism, leveraging static analysis and hierarchical search, automatically optimizes and tunes programs for high performance across diverse hardware platforms without human intervention. Experimental results across six real-world HPC benchmarks on AVX2, AVX512, and NEON architectures demonstrate that HybridSIMD outperforms state-of-the-art SIMD libraries. Notably, the highest speedups achieved by HybridSIMD are 185.34$\times$ on AVX2, 97.80$\times$ on AVX512, and 71.32$\times$ on NEON, showcasing superior computational efficiency and adaptability.

Haolin Pan

Institute of Software, Chinese Academy of Sciences;School of Intelligent Science and Technology, HIAS, UCAS, Hangzhou;University of Chinese Academy of Sciences

Xulin Zhou

Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences

China

Mingjie Xing

Institute of Software, Chinese Academy of Sciences

China

Yanjun Wu

Institute of Software, Chinese Academy of Sciences

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Mon 17 Nov
Displayed time zone: Seoul change

	11:00 - 12:30	Efficiency & Fairness 1Research Papers at Grand Hall 4

	11:00 12m Talk		AutoFid: Adaptive and Noise-Aware Fidelity Measurement for Quantum Programs via Circuit Graph Analysis Research Papers Tingting Li Zhejiang University, Ziming Zhao Zhejiang University, Jianwei Yin Zhejiang University
	11:12 12m Talk		HybridSIMD: A Super C++ SIMD Library with Integrated Auto-tuning Capabilities Research Papers Haolin Pan Institute of Software, Chinese Academy of Sciences;School of Intelligent Science and Technology, HIAS, UCAS, Hangzhou;University of Chinese Academy of Sciences, Xulin Zhou Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Mingjie Xing Institute of Software, Chinese Academy of Sciences, Yanjun Wu Institute of Software, Chinese Academy of Sciences
	11:25 12m Talk		PEACE: Towards Efficient Project-Level Performance Optimization via Hybrid Code Editing Research Papers Xiaoxue Ren Zhejiang University, Jun Wan Zhejiang University, Yun Peng The Chinese University of Hong Kong, Zhongxin Liu Zhejiang University, Ming Liang Ant Group, Dajun Chen Ant Group, Wei Jiang Ant Group, Yong Li Ant Group
	11:38 12m Talk		CoTune: Co-evolutionary Configuration Tuning Research Papers Gangda Xiong University of Electronic Science and Technology of China, Tao Chen University of Birmingham Pre-print
	11:51 12m Talk		It's Not Easy Being Green: On the Energy Efficiency of Programming Languages Research Papers Nicolas van Kempen University of Massachusetts Amherst, USA, Hyuk-Je Kwon University of Massachusetts Amherst, Dung Nguyen University of Massachusetts Amherst, Emery D. Berger University of Massachusetts Amherst and Amazon Web Services
	12:04 12m Talk		When Faster Isn't Greener: The Hidden Costs of LLM-Based Code Optimization Research Papers Tristan Coignion Université de Lille - Inria, Clément Quinton Université de Lille, Romain Rouvoy University Lille 1 and INRIA
	12:17 12m Talk		United We Stand: Towards End-to-End Log-based Fault Diagnosis via Interactive Multi-Task Learning Research Papers Minghua He Peking University, Chiming Duan Peking University, Pei Xiao Peking University, Tong Jia Institute for Artificial Intelligence, Peking University, Beijing, China, Siyu Yu The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Lingzhe Zhang Peking University, China, Weijie Hong Peking university, Jing Han ZTE Corporation, Yifan Wu Peking University, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Gang Huang Peking University