SEMANTIC CODE FINDER: An Efficient Semantic Search Framework for Large-Scale Codebases (ICSE 2025 - Software Engineering in Practice (SEIP))

Who

daeha ryu, Seokjun Ko, Eunbi Jang, jinyoung park, myunggwan kim, changseo park

Track

ICSE 2025 SE In Practice (SEIP)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 1 May 2025 11:45 - 12:00 at 212 - AI for Analysis 3 Chair(s): Gias Uddin
Fri 2 May 2025 17:15 - 17:30 at 215 - SE for AI with Quality 3 Chair(s): Sumon Biswas

Abstract

We present SEMANTIC CODE FINDER, a framework for semantic code search that delivers high-level search performance and supports multiple programming languages. Leveraging code summaries, it enables meaningful semantic code search by extracting the core content of code methods and using this information for search queries. Evaluated on large-scale codebases, SEMANTIC CODE FINDER demonstrates its effectiveness in outperforming existing open-source code search tools, achieving higher recall and precision rates. It delivers superior search performance across Java, Python, and C++. Notably, SEMANTIC CODE FINDER outperforms CodeMatcher, a previously successful semantic code search tool, by approximately 41% in terms of MRR. Moreover, it shows consistent performance across Java, Python, and C++ languages, highlighting its robustness and effectiveness. Currently, it is being used as a code search service for a significant amount of source code within Samsung Electronics, meeting the needs of its developers.

daeha ryu

Innovation Center, Samsung Electronics

Seokjun Ko

Samsung Electronics Co.

South Korea

Eunbi Jang

Innovation Center, Samsung Electronics

jinyoung park

Innovation Center, Samsung Electronics

myunggwan kim

Innovation Center, Samsung Electronics

changseo park

Innovation Center, Samsung Electronics

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 1 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	AI for Analysis 3SE In Practice (SEIP) / Research Track at 212 Chair(s): Gias Uddin York University, Canada

11:00 15m Talk		COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge Research Track Yichen LI The Chinese University of Hong Kong, Yulun Wu The Chinese University of Hong Kong, Jinyang Liu Chinese University of Hong Kong, Zhihan Jiang The Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Guangba Yu The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong
11:15 15m Talk		Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding Research Track Yifeng Di Purdue University, Tianyi Zhang Purdue University
11:30 15m Talk		HumanEvo: An Evolution-aware Benchmark for More Realistic Evaluation of Repository-level Code Generation Research Track Dewu Zheng Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Ensheng Shi Xi’an Jiaotong University, Ruikai Zhang Huawei Cloud Computing Technologies, Yuchi Ma Huawei Cloud Computing Technologies, Hongyu Zhang Chongqing University, Zibin Zheng Sun Yat-sen University
11:45 15m Talk		SEMANTIC CODE FINDER: An Efficient Semantic Search Framework for Large-Scale Codebases SE In Practice (SEIP) daeha ryu Innovation Center, Samsung Electronics, Seokjun Ko Samsung Electronics Co., Eunbi Jang Innovation Center, Samsung Electronics, jinyoung park Innovation Center, Samsung Electronics, myunggwan kim Innovation Center, Samsung Electronics, changseo park Innovation Center, Samsung Electronics
12:00 15m Talk		Time to Retrain? Detecting Concept Drifts in Machine Learning Systems SE In Practice (SEIP) Tri Minh-Triet Pham Concordia University, Karthikeyan Premkumar Ericsson, Mohamed Naili Ericsson, Jinqiu Yang Concordia University
12:15 15m Talk		UML Sequence Diagram Generation: A Multi-Model, Multi-Domain Evaluation SE In Practice (SEIP) Chi Xiao Ericsson AB, Daniel Ståhl Ericsson AB, Jan Bosch Chalmers University of Technology

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30	SE for AI with Quality 3Research Track / SE In Practice (SEIP) at 215 Chair(s): Sumon Biswas Case Western Reserve University

16:00 15m Talk		Improved Detection and Diagnosis of Faults in Deep Neural Networks Using Hierarchical and Explainable ClassificationSE for AI Research Track Sigma Jahan Dalhousie University, Mehil Shah Dalhousie University, Parvez Mahbub Dalhousie University, Masud Rahman Dalhousie University Pre-print
16:15 15m Talk		Lightweight Concolic Testing via Path-Condition Synthesis for Deep Learning LibrariesSE for AI Research Track Sehoon Kim , Yonghyeon Kim UNIST, Dahyeon Park UNIST, Yuseok Jeon UNIST, Jooyong Yi UNIST, Mijung Kim UNIST
16:30 15m Talk		Mock Deep Testing: Toward Separate Development of Data and Models for Deep LearningSE for AI Research Track Ruchira Manke Tulane University, USA, Mohammad Wardat Oakland University, USA, Foutse Khomh Polytechnique Montréal, Hridesh Rajan Tulane University
16:45 15m Talk		RUG: Turbo LLM for Rust Unit Test GenerationSE for AI Research Track Xiang Cheng Georgia Institute of Technology, Fan Sang Georgia Institute of Technology, Yizhuo Zhai Georgia Institute of Technology, Xiaokuan Zhang George Mason University, Taesoo Kim Georgia Institute of Technology Pre-print Media Attached File Attached
17:00 15m Talk		Test Input Validation for Vision-based DL Systems: An Active Learning ApproachSE for AI SE In Practice (SEIP) Delaram Ghobari University of Ottawa, Mohammad Hossein Amini University of Ottawa, Dai Quoc Tran SmartInsideAI Company Ltd. and Sungkyunkwan University, Seunghee Park SmartInsideAI Company Ltd. and Sungkyunkwan University, Shiva Nejati University of Ottawa, Mehrdad Sabetzadeh University of Ottawa Pre-print
17:15 15m Talk		SEMANTIC CODE FINDER: An Efficient Semantic Search Framework for Large-Scale Codebases SE In Practice (SEIP) daeha ryu Innovation Center, Samsung Electronics, Seokjun Ko Samsung Electronics Co., Eunbi Jang Innovation Center, Samsung Electronics, jinyoung park Innovation Center, Samsung Electronics, myunggwan kim Innovation Center, Samsung Electronics, changseo park Innovation Center, Samsung Electronics