The Landscape of Source Code Representation Learning in AI-Driven Software Engineering Tasks (ICSE 2023 - Technical Briefings)

Who

Sridhar Chimalakonda, Debeshee Das, Alex Mathai, Srikanth Tamilselvam, Atul Kumar

Track

ICSE 2023 Technical Briefings

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 19 May 2023 11:00 - 12:30 at Meeting Room 112 - Technical Briefing 6

Abstract

Appropriate representation of source code and its relevant properties form the backbone of Artificial Intelligence (AI)/ Machine Learning (ML) pipelines for various software engineering tasks such as \textit{code classification}, \textit{bug prediction}, \textit{code clone detection}, and \textit{code summarization}. In the literature, researchers have extensively experimented with different kinds of source code representations (syntactic, semantic, integrated, customized) and properties ranging from tree/graph representations such as Abstract Syntax Trees (ASTs) to pre-trained transformer models like CodeBERT. In addition, it is common for researchers to create hand-crafted and customized source code representations for an appropriate software engineering task. In a 2018 survey, Allamanis et al. listed ~35 different ways of source code representations for different software engineering (SE) tasks like ASTs, customized ASTs, Control Flow Graphs (CFGs), Data Flow Graphs (DFGs) and so on. The main goal of this tutorial is two-fold (i) Present an overview of the state-of-the-art of source code representations and corresponding ML pipelines with an explicit focus on the pros and cons of each of the representations (ii) Practical challenges in infusing different code views in the state-of-the-art ML models.

Sridhar Chimalakonda

IIT Tirupati

India

Debeshee Das

Indian Institute of Technology Tirupati

India

Alex Mathai

IBM India Research Labs

Srikanth Tamilselvam

IBM Research

Atul Kumar