Aligning Documentation and Q&A Forum through Constrained Decoding with Weak Supervision
Stack Overflow (SO) is a widely used question-and-answer (Q&A) forum dedicated to software development. It plays a supplementary role to official documentation (DOC for short) by offering practical examples and resolving uncertainties. However, the process of simultaneously consulting both the documentation and SO posts can be challenging and time-consuming due to their disconnected nature. In this study, we propose DOSA, a novel approach to automatically align SO and DOC, which inject domain-specific knowledge about the DOC structure into large language models (LLMs) through weak supervision and constrained decoding, thereby enhancing knowledge retrieval and streamlining task completion during the software development procedure. Our preliminary experiments find that DOSA outperforms various widely-used baselines, showing the promise of using generative retrieval models to perform low-resource software engineering tasks.