Toward Neurosymbolic Program Comprehension (ICPC 2025 - Early Research Achievements (ERA))

Who

Alejandro Velasco, Aya Garryyeva, David Nader Palacio, Antonio Mastropaolo, Denys Poshyvanyk

Track

ICPC 2025 Early Research Achievements (ERA)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 11:38 - 11:42 at 205 - Empirical Findings, Future Visions, Recommendations Chair(s): Mark Hills, Coen De Roover, Gema Rodríguez-Pérez

Abstract

Recent advancements in Large Language Models (LLMs) have paved the way for Large Code Models (LCMs), enabling automation in complex software engineering tasks, such as code generation, software testing, and program comprehension, among others. Tools like GitHub Copilot and ChatGPT have shown substantial benefits in supporting developers across various practices. However, the ambition to scale these models to trillion-parameter sizes, exemplified by GPT-4, poses significant challenges that limit the usage of Artificial Intelligence (AI)-based systems powered by large Deep Learning (DL) models. These include rising computational demands for training and deployment and issues related to trustworthiness, bias, and interpretability. Such factors can make managing these models impractical for many organizations, while their "black-box'' nature undermines key aspects, including transparency and accountability. In this paper, we question the prevailing assumption that increasing model parameters is always the optimal path forward, provided there is sufficient new data to learn additional patterns. In particular, we advocate for a Neurosymbolic research direction that combines the strengths of existing DL techniques (e.g., LLMs) with traditional symbolic methods–renowned for their reliability, speed, and determinism. To this end, we outline the core features and present preliminary results for our envisioned approach, aimed at establishing the first NeuroSymbolic Program Comprehension (NsPC) framework to aid in identifying defective code components.

Link to Preprint

https://arxiv.org/abs/2502.01806

Alejandro Velasco

William & Mary

United States

Aya Garryyeva

William and Mary

David Nader Palacio

William & Mary

United States

Antonio Mastropaolo

William and Mary, USA

Denys Poshyvanyk

William & Mary

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 28 Apr
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Empirical Findings, Future Visions, Recommendations Replications and Negative Results (RENE) / Early Research Achievements (ERA) / Tool Demonstration / Research Track at 205 Chair(s): Mark Hills Appalachian State University, Coen De Roover Vrije Universiteit Brussel, Gema Rodríguez-Pérez Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus

11:00 10m Talk		Terminal Lucidity: Envisioning the Future of the Terminal Research Track Michael MacInnis Carleton University, Olga Baysal Carleton University, Michele Lanza Software Institute - USI, Lugano Pre-print
11:10 6m Talk		Exploring Code Comprehension in Scientific Programming: Preliminary Insights from Research Scientists Early Research Achievements (ERA) Alyssia Chen University of Hawaii at Manoa, Carol Wong University of Hawaii at Manoa, Bonita Sharif University of Nebraska-Lincoln, USA, Anthony Peruma University of Hawai‘i at Mānoa Pre-print
11:16 10m Talk		Method Names in Jupyter Notebooks: An Exploratory Study Research Track Carol Wong University of Hawaii at Manoa, Gunnar Larsen University of Hawaii at Manoa, Rocky Huang University of Hawaii at Manoa, Bonita Sharif University of Nebraska-Lincoln, USA, Anthony Peruma University of Hawai‘i at Mānoa
11:26 6m Talk		SCALAR: A Part-of-speech Tagger for Identifiers Tool Demonstration Christian Newman , Brandon Scholten Kent State University, Sophia Testa Kent State University, Joshua Behler Kent State University, Syreen Banabilah Kent State University, Michael L. Collard The University of Akron, Michael J. Decker Bowling Green State University, Mohamed Wiem Mkaouer University of Michigan - Flint, Marcos Zampieri George mason University, Eman Abdullah AlOmar Stevens Institute of Technology, USA, Reem Alsuhaibani Prince Sultan University, Anthony Peruma University of Hawai‘i at Mānoa, Jonathan I. Maletic Kent State University
11:32 6m Talk		How do Papers Make into Machine Learning Frameworks: A Preliminary Study on TensorFlow Early Research Achievements (ERA) Federica Pepe University of Sannio, Claudia Farkas York University, Maleknaz Nayebi York University, Giulio Antoniol Ecole Polytechnique de Montreal, Massimiliano Di Penta University of Sannio, Italy
11:38 4m Talk		Toward Neurosymbolic Program Comprehension Early Research Achievements (ERA) Alejandro Velasco William & Mary, Aya Garryyeva William and Mary, David Nader Palacio William & Mary, Antonio Mastropaolo William and Mary, USA, Denys Poshyvanyk William & Mary Pre-print
11:42 10m Talk		Combining Static Analysis Techniques for Program Comprehension Using Slicito Tool Demonstration Robert Husak Charles University, Jan Kofroň Charles University, Filip Zavoral Charles University Pre-print File Attached
11:52 6m Talk		Mining Code Change Patterns in Ada Projects Replications and Negative Results (RENE) Robin van Straeten Radboud University, Bin Lin Hangzhou Dianzi University
11:58 6m Talk		Telling Software Evolution Stories With Sonification Early Research Achievements (ERA) Carmen Armenti Software Institute - USI, Lugano, Michele Lanza Software Institute - USI, Lugano
12:04 10m Talk		Attributed Multiplex Learning for Analogical Third-Party Library Recommendation and Retrieval Research Track Baihui Sang State Key Laboratory for Novel Software Technology, Nanjing University, Liang Wang Nanjing University, Jierui Zhang Nanjing University, Xianping Tao Nanjing University
12:14 6m Talk		LLM2FedLLM - A Tool for Simulating Federated LLMs for Software Engineering Tasks Tool Demonstration Jahnavi Kumar Indian Institute of Technology Tirupati, India, Siddhartha Gandu Indian Institute of Technology Tirupati, Sridhar Chimalakonda Indian Institute of Technology Tirupati
12:20 10m Live Q&A		Session's Discussion: "Empirical Findings, Future Visions, Recommendations" Research Track