Improving Quality of LLM Code Generation in Low-Resource Programming Languages via Uncertainty Estimation
This program is tentative and subject to change.
Large language models for source code (Code LLMs) demonstrate great performance on high-resource programming languages (HRPLs) but struggle with low-resource ones (LRPLs). Previous studies have improved LLM performance on LRPLs by continued training or tokenizer adaptation. However, they require costly data and can cause catastrophic forgetting. This paper proposes to address the poor performance of LLMs on LRPLs using uncertainty estimation (UE). UE methods have advanced LLM performance on natural language tasks, but are underexplored in source code settings. The research may provide three contributions: (1) a new code generation benchmark evaluating not only functional correctness but also readability, efficiency, and idiomatic style across Python, Java, Racket, and Elixir; (2) a new benchmark for evaluating uncertainty estimation when generating code; and (3) methods to improve LRPL code generation by leveraging UE. The methods utilizing UE include filtering synthetic training data by low uncertainty, an UE-driven curriculum learning strategy, uncertainty-aware decoding, and using uncertainty as an RL reward in alignment. The research may provide a comprehensive evaluation of uncertainty in code models, demonstrate that UE can improve LRPL generation, and open-source release of benchmarks and models as outcomes.
This program is tentative and subject to change.
Thu 20 NovDisplayed time zone: Seoul change
| 16:00 - 18:00 | |||
| 16:0045m Talk | Secure Transaction Semantics: Analysis, Vulnerability Detection, and Attack Modeling Doctoral Symposium Yixuan Liu Nanyang Technological University | ||
| 16:4545m Talk | Improving Quality of LLM Code Generation in Low-Resource Programming Languages via Uncertainty Estimation Doctoral Symposium Georgii Andriushchenko Innopolis University | ||
| 17:3015m Day closing | Closing Doctoral Symposium | ||
