Aligning Documentation and Q&A Forum through Constrained Decoding with Weak Supervision (ICSME 2023 - New Ideas and Emerging Results Track)

Who

Rohith Pudari, Shiyuan Zhou, Iftekhar Ahmed, Zhuyun Dai, Shurui Zhou

Track

ICSME 2023 New Ideas and Emerging Results Track

Time Zone

The program is currently displayed in (GMT-05:00) Bogota, Lima, Quito, Rio Branco.

Use conference time zone: (GMT-05:00) Bogota, Lima, Quito, Rio BrancoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 6 Oct 2023 11:35 - 11:46 at Session 2 Room - RGD 04 - Program Comprehension Chair(s): Oscar Chaparro, Massimiliano Di Penta

Abstract

Stack Overflow (SO) is a widely used question-and-answer (Q&A) forum dedicated to software development. It plays a supplementary role to official documentation (DOC for short) by offering practical examples and resolving uncertainties. However, the process of simultaneously consulting both the documentation and SO posts can be challenging and time-consuming due to their disconnected nature. In this study, we propose DOSA, a novel approach to automatically align SO and DOC, which inject domain-specific knowledge about the DOC structure into large language models (LLMs) through weak supervision and constrained decoding, thereby enhancing knowledge retrieval and streamlining task completion during the software development procedure. Our preliminary experiments find that DOSA outperforms various widely-used baselines, showing the promise of using generative retrieval models to perform low-resource software engineering tasks.

Link to Preprint

https://shuiblue.github.io/forcolab-uoft/paper/ICSME2023_NIER_StackOverflow.pdf

Rohith Pudari

University of Toronto

Canada

Shiyuan Zhou

University of Toronto

Canada

Iftekhar Ahmed

University of California at Irvine

United States

Zhuyun Dai

Google

United States

Shurui Zhou

University of Toronto

Canada

Time Zone

The program is currently displayed in (GMT-05:00) Bogota, Lima, Quito, Rio Branco.

Use conference time zone: (GMT-05:00) Bogota, Lima, Quito, Rio BrancoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 6 Oct
Displayed time zone: Bogota, Lima, Quito, Rio Branco change

10:30 - 12:00	Program ComprehensionResearch Track / New Ideas and Emerging Results Track / Registered Reports Track / Tool Demo Track at Session 2 Room - RGD 04 Chair(s): Oscar Chaparro William & Mary, Massimiliano Di Penta University of Sannio, Italy

10:30 16m Talk		How do Developers Improve Code Readability? An Empirical Study of Pull Requests Research Track Carlos Eduardo Carvalho Dantas Federal University of Uberlândia, Adriano Mendonça Rocha Federal University of Uberlândia, Marcelo De Almeida Maia Federal University of Uberlandia
10:46 11m Talk		Summarize Me: The Future of Issue Thread Interpretation New Ideas and Emerging Results Track Abhishek Kumar Indian Institute of Technology Kharagpur, Partha Pratim Das Indian Institute of Technology, Kharagpur, Partha Pratim Chakrabarti Indian Institute of Technology, Kharagpur
10:57 11m Talk		Bugsplainer: Leveraging Code Structures to Explain Software Bugs with Neural Machine Translation Tool Demo Track Parvez Mahbub Dalhousie University, Ohiduzzaman Shuvo Dalhousie University, Masud Rahman Dalhousie University, Avinash Gopal
11:08 16m Talk		Knowledge Graph based Explainable Question Retrieval for Programming Tasks Research Track Mingwei Liu Fudan University, Simin Yu Fudan University, Xin Peng Fudan University, Xueying Du Fudan University, Tianyong Yang Fudan University, Huanjun Xu Fudan University, Gaoyang Zhang Fudan University Pre-print File Attached
11:24 11m Talk		Investigating the Impact of Vocabulary Difficulty and Code Naturalness on Program Comprehension Registered Reports Track Bin Lin Radboud University, Gregorio Robles Universidad Rey Juan Carlos
11:35 11m Talk		Aligning Documentation and Q&A Forum through Constrained Decoding with Weak Supervision New Ideas and Emerging Results Track Rohith Pudari University of Toronto, Shiyuan Zhou University of Toronto, Iftekhar Ahmed University of California at Irvine, Zhuyun Dai Google, Shurui Zhou University of Toronto Pre-print
11:46 14m Live Q&A		1:1 Q&A Research Track