Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning
Large Language Models (LLMs) were shown to struggle with long-term planning, which may be caused by the limited way in which they explore the space of possible solutions. We propose an architecture where a Reinforcement Learning (RL) Agent guides an LLM’s space exploration: (1) the Agent has access to domain-specific information, and can therefore make decisions about the quality of candidate solutions based on specific and relevant metrics, which were not explicitly considered by the LLM’s training objective; (2) the LLM can focus on generating immediate next steps, without the need for long-term planning. We allow non-linear reasoning by exploring alternative paths and backtracking. We evaluate this architecture on the program equivalence task, and compare it against Chain of Thought (CoT) and Tree of Thoughts (ToT). We assess both the downstream task, denoting the binary classification, and the intermediate reasoning steps. Our approach compares positively against CoT and ToT.
Tue 24 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:00 - 15:30 | LLM for SE 2Research Papers / Industry Papers / Ideas, Visions and Reflections at Cosmos Hall Chair(s): Jialun Cao Hong Kong University of Science and Technology | ||
14:00 20mTalk | Migrating Code At Scale With LLMs At Google Industry Papers Celal Ziftci Google, Stoyan Nikolov Google, Inc., Anna Sjovall Google, Inc., Bo Kim Google, Daniele Codecasa Google, Inc., Max Kim Google | ||
14:20 20mTalk | Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning Research Papers DOI | ||
14:40 20mTalk | Smaller but Better: Self-Paced Knowledge Distillation for Lightweight yet Effective LCMs Research Papers Yujia Chen Harbin Institute of Technology, Shenzhen, Yang Ye Huawei Cloud Computing Technologies Co., Ltd., Zhongqi Li Huawei Cloud Computing Technologies Co., Ltd., Yuchi Ma Huawei Cloud Computing Technologies, Cuiyun Gao Harbin Institute of Technology, Shenzhen DOI | ||
15:00 10mTalk | Enabling Scalable Proactive Workspaces With Environment-Wide Context Ideas, Visions and Reflections Nick Bradley University of British Columbia, Thomas Fritz University of Zurich, Reid Holmes University of British Columbia | ||
15:10 20mTalk | Bridging Operator Semantic Inconsistencies: A Source-level Cross-framework Model Conversion Approach Research Papers Xingpei Li National University of Defense Technology, China, Yan Lei Chongqing University, Zhouyang Jia National University of Defense Technology, Yuanliang Zhang National University of Defense Technology, Haoran Liu National University of Defense Technology, Liqian Chen National University of Defense Technology, Wei Dong National University of Defense Technology, Shanshan Li National University of Defense Technology DOI |
This is the main event hall of Clarion Hotel, which will be used to host keynote talks and other plenary sessions. The FSE and ISSTA banquets will also happen in this room.
The room is just in front of the registration desk, on the other side of the main conference area. The large doors with numbers “1” and “2” provide access to the Cosmos Hall.