TCSE logo 
 Sigsoft logo
Sustainability badge

This program is tentative and subject to change.

Wed 30 Apr 2025 17:15 - 17:22 at 212 - AI for Analysis 2

As software projects rapidly evolve, software artifacts become more complex and defects behind get harder to identify. The emerging Transformerbased approaches, though achieving remarkable performance, struggle with long code sequences due to their self-attention mechanism, which scales quadratically with the sequence length. This paper introduces SparseCoder, an innovative approach incorporating sparse attention and learned token pruning (LTP) method (adapted from natural language processing) to address this limitation. Compared to previous state-of-the-art models (CodeBERT, RoBERTa and CodeT5), our experiments demonstrate that SparseCoder can handle significantly longer input sequences�at least twice as long, within the limits of our hardware resources and data statistics. Additionally, SparseCoder is four times faster than other methods measured in runtime, achieving a 50% reduction in floating point operations per second (FLOPs) with a negligible performance drop of less than 1% compared to Transformers using sparse attention (Sparse Atten). Plotting FLOPs of model inference against token lengths reveals that SparseCoder scales linearly, whereas other methods, including the current state-of-the-art model CodeT5, scale quadratically. Moreover, SparseCoder enhances interpretability by visualizing non-trivial tokens layer-wise.

This program is tentative and subject to change.

Wed 30 Apr

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
16:00
15m
Talk
Neurosymbolic Modular Refinement Type Inference
Research Track
Georgios Sakkas UC San Diego, Pratyush Sahu UC San Diego, Kyeling Ong University of California, San Diego, Ranjit Jhala UCSD
16:15
15m
Talk
An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?
Research Track
Hyunjae Suh University of California, Irvine, Mahan Tafreshipour University of California at Irvine, Jiawei Li University of California Irvine, Adithya Bhattiprolu University of California, Irvine, Iftekhar Ahmed University of California at Irvine
16:30
15m
Talk
Planning a Large Language Model for Static Detection of Runtime Errors in Code Snippets
Research Track
Smit Soneshbhai Patel University of Texas at Dallas, Aashish Yadavally University of Texas at Dallas, Hridya Dhulipala University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas
16:45
15m
Talk
LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion
Research Track
Chong Wang Nanyang Technological University, Kaifeng Huang Tongji University, Jian Zhang Nanyang Technological University, Yebo Feng Nanyang Technological University, Lyuye Zhang Nanyang Technological University, Yang Liu Nanyang Technological University, Xin Peng Fudan University
17:00
15m
Talk
Knowledge-Enhanced Program Repair for Data Science Code
Research Track
Shuyin Ouyang King's College London, Jie M. Zhang King's College London, Zeyu Sun Institute of Software, Chinese Academy of Sciences, Albert Merono Penuela King's College London
17:15
7m
Talk
SparseCoder: Advancing Source Code Analysis with Sparse Attention and Learned Token Pruning
Journal-first Papers
Xueqi Yang NCSU, Mariusz Jakubowski Microsoft, Li Kang Microsoft, Haojie Yu Microsoft, Tim Menzies North Carolina State University
:
:
:
: