Post2Vec: Learning Distributed Representations of Stack Overflow Posts
Fri 13 May 2022 05:10 - 05:15 at ICSE room 3-odd hours - Mining Software Repositories 2 Chair(s): Jean-Guy Schneider
Fri 27 May 2022 09:00 - 09:05 at Room 301+302 - Papers 16: Mining Software Repositories 1 Chair(s): Grace Lewis
Fri 27 May 2022 13:30 - 15:00 at Ballroom Gallery - Posters 3
Past studies have proposed solutions that analyze Stack Overflow content to help users find desired information or aid various downstream software engineering tasks. A common step performed by those solutions is to extract suitable representations of posts; typically, in the form of meaningful vectors. These vectors are then used for different tasks, for example, tag recommendation, relatedness prediction, post classification, and API recommendation. Intuitively, the quality of the vector representations of posts determines the effectiveness of the solutions in performing the respective tasks. In this work, to aid existing studies that analyze Stack Overflow posts, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. Post2Vec is aware of different types of content present in Stack Overflow posts, i.e., title, description, and code snippets, and integrates them seamlessly to learn post representations. Tags provided by Stack Overflow users that serve as a common vocabulary that captures the semantics of posts are used to guide Post2Vec in its task. To evaluate the quality of Post2Vec’s deep learning architecture, we first investigate its end-to-end effectiveness in tag recommendation task. The results are compared to those of state-of-the-art tag recommendation approaches that also employ deep neural networks. We observe that Post2Vec achieves 15-25% improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins (by 10%, 7%, and 10% in terms of F1-score, F1-score, and correctness, respectively). We release our replication package at https://github.com/maxxbw54/Post2Vec.
Mon 9 MayDisplayed time zone: Eastern Time (US & Canada) change
Fri 13 MayDisplayed time zone: Eastern Time (US & Canada) change
05:00 - 06:00 | Mining Software Repositories 2SEIP - Software Engineering in Practice / Journal-First Papers at ICSE room 3-odd hours Chair(s): Jean-Guy Schneider Deakin University | ||
05:00 5mTalk | An Empirical Study of Release Note Production and Usage in Practice Journal-First Papers Tingting Bi Monash Univerity, Xin Xia Huawei Software Engineering Application Technology Lab, David Lo Singapore Management University, John Grundy Monash University, Thomas Zimmermann Microsoft Research | ||
05:05 5mTalk | A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits Journal-First Papers Steffen Herbold TU Clausthal, Alexander Trautsch University of Göttingen, Benjamin Ledel TU Clausthal, Alireza Aghamohammadi Sharif University of Technology, Taher A Ghaleb University of Ottawa, Kuljit Kaur Chahal Guru Nanak Dev University, Tim Bossenmaier Karlsruhe Institute of Technology (KIT), Bhaveet Nagaria Brunel University London, Philip Makedonski University of Goettingen, Matin Nili Ahmadabadi University of Tehran, Kristof Szabados Ericsson Hungary ltd., Helge Spieker Simula Research Laboratory, Norway, Matej Madeja Technical University of Košice, Nathaniel G. Hoy Brunel University London, Valentina Lenarduzzi University of Oulu, Shangwen Wang National University of Defense Technology, Gema Rodríguez-Pérez University of British Columbia (UBC), Ricardo Colomo-Palacios Østfold University College, Roberto Verdecchia Vrije Universiteit Amsterdam, Paramvir Singh The University of Auckland, Yihao Qin , Debasish Chakroborti University of Saskatchewan, Willard Davis IBM, Vijay Walunj University of Missouri-Kansas City, Hongjun Wu National University of Defense Technology, Diego Marcilio USI Università della Svizzera italiana, Omar Alam Trent University, Abdullah Aldaeej Imam Abdulrahman Bin Faisal University, Idan Amit The Hebrew University, Burak Turhan University of Oulu, Simon Eismann University of Würzburg, Anna-Katharina Wickert TU Darmstadt, Germany, Ivano Malavolta Vrije Universiteit Amsterdam, Matúš Sulír Technical University of Košice, Fatemeh Hendijani Fard University of British Columbia, Austin Henley University of Tennessee, Efstratios Kourtzanidis University Of Macedonia, Eray Tüzün Bilkent University, Christoph Treude University of Melbourne, Simin Maleki Shamasbi Indendent Researcher, Ivan Pashchenko University of Trento, Marvin Wyrich University of Stuttgart, James C. Davis Purdue University, USA, Alexander Serebrenik Eindhoven University of Technology, Ella Albrecht University of Goettingen, Ethem Utku Aktas Softtech Inc., Daniel Strüber Chalmers | University of Gothenburg / Radboud University, Johannes Erbel University of Goettingen Pre-print Media Attached | ||
05:10 5mTalk | Post2Vec: Learning Distributed Representations of Stack Overflow Posts Journal-First Papers Bowen Xu Singapore Management University, Thong Hoang Singapore Management University, Singapore, Abhishek Sharma Veracode, Inc., Chengran Yang Singapore Management University, Xin Xia Huawei Software Engineering Application Technology Lab, David Lo Singapore Management University Link to publication DOI Pre-print | ||
05:15 5mTalk | An Exploratory Study on the Repeatedly Shared External Links on Stack Overflow Journal-First Papers Jiakun Liu Zhejiang University, Haoxiang Zhang Huawei, Xin Xia Huawei Software Engineering Application Technology Lab, David Lo Singapore Management University, Ying Zou Queen's University, Kingston, Ontario, Ahmed E. Hassan Queen's University, Shanping Li Zhejiang University Link to publication DOI Media Attached | ||
05:20 5mTalk | Understanding Shared Links and Their Intentions to Meet Information Needs in Modern Code Review: A Case Study of the OpenStack and Qt Projects Journal-First Papers Dong Wang Kyushu University, Japan, Tao Xiao Nara Institute of Science and Technology, Patanamon Thongtanunam University of Melbourne, Raula Gaikovina Kula Nara Institute of Science and Technology, Kenichi Matsumoto Nara Institute of Science and Technology Link to publication Media Attached | ||
05:25 5mTalk | Bug Tracking Process Smells In Practice SEIP - Software Engineering in Practice DOI Pre-print Media Attached |
Fri 27 MayDisplayed time zone: Eastern Time (US & Canada) change
09:00 - 10:30 | Papers 16: Mining Software Repositories 1NIER - New Ideas and Emerging Results / Technical Track / Journal-First Papers / SEIP - Software Engineering in Practice at Room 301+302 Chair(s): Grace Lewis Carnegie Mellon Software Engineering Institute | ||
09:00 5mTalk | Post2Vec: Learning Distributed Representations of Stack Overflow Posts Journal-First Papers Bowen Xu Singapore Management University, Thong Hoang Singapore Management University, Singapore, Abhishek Sharma Veracode, Inc., Chengran Yang Singapore Management University, Xin Xia Huawei Software Engineering Application Technology Lab, David Lo Singapore Management University Link to publication DOI Pre-print | ||
09:05 5mTalk | Assisting Example-based API Misuse Detection via Complementary Artificial Examples Journal-First Papers Maxime Lamothe Polytechnique Montréal, Heng Li Polytechnique Montréal, Weiyi Shang Concordia University Link to publication DOI Pre-print Media Attached | ||
09:10 5mTalk | What happens in my code reviews? An investigation on automatically classifying review changes Journal-First Papers Enrico Fregnan University of Zurich, Switzerland, Fernando Petrulio University of Zurich, Linda Di Geronimo University of Zurich, Switzerland, Alberto Bacchelli University of Zurich Link to publication Pre-print Media Attached | ||
09:15 5mTalk | Bus Factor In Practice SEIP - Software Engineering in Practice Elgun Jabrayilzade Bilkent University, Mikhail Evtikhiev JetBrains Research, Eray Tüzün Bilkent University, Vladimir Kovalenko JetBrains Research Pre-print Media Attached | ||
09:20 5mTalk | A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits Journal-First Papers Steffen Herbold TU Clausthal, Alexander Trautsch University of Göttingen, Benjamin Ledel TU Clausthal, Alireza Aghamohammadi Sharif University of Technology, Taher A Ghaleb University of Ottawa, Kuljit Kaur Chahal Guru Nanak Dev University, Tim Bossenmaier Karlsruhe Institute of Technology (KIT), Bhaveet Nagaria Brunel University London, Philip Makedonski University of Goettingen, Matin Nili Ahmadabadi University of Tehran, Kristof Szabados Ericsson Hungary ltd., Helge Spieker Simula Research Laboratory, Norway, Matej Madeja Technical University of Košice, Nathaniel G. Hoy Brunel University London, Valentina Lenarduzzi University of Oulu, Shangwen Wang National University of Defense Technology, Gema Rodríguez-Pérez University of British Columbia (UBC), Ricardo Colomo-Palacios Østfold University College, Roberto Verdecchia Vrije Universiteit Amsterdam, Paramvir Singh The University of Auckland, Yihao Qin , Debasish Chakroborti University of Saskatchewan, Willard Davis IBM, Vijay Walunj University of Missouri-Kansas City, Hongjun Wu National University of Defense Technology, Diego Marcilio USI Università della Svizzera italiana, Omar Alam Trent University, Abdullah Aldaeej Imam Abdulrahman Bin Faisal University, Idan Amit The Hebrew University, Burak Turhan University of Oulu, Simon Eismann University of Würzburg, Anna-Katharina Wickert TU Darmstadt, Germany, Ivano Malavolta Vrije Universiteit Amsterdam, Matúš Sulír Technical University of Košice, Fatemeh Hendijani Fard University of British Columbia, Austin Henley University of Tennessee, Efstratios Kourtzanidis University Of Macedonia, Eray Tüzün Bilkent University, Christoph Treude University of Melbourne, Simin Maleki Shamasbi Indendent Researcher, Ivan Pashchenko University of Trento, Marvin Wyrich University of Stuttgart, James C. Davis Purdue University, USA, Alexander Serebrenik Eindhoven University of Technology, Ella Albrecht University of Goettingen, Ethem Utku Aktas Softtech Inc., Daniel Strüber Chalmers | University of Gothenburg / Radboud University, Johannes Erbel University of Goettingen Pre-print Media Attached | ||
09:25 5mTalk | Towards Mining OSS Skills from GitHub Activity NIER - New Ideas and Emerging Results Jenny T. Liang University of Washington, Thomas Zimmermann Microsoft Research, Denae Ford Microsoft Research DOI Pre-print Media Attached | ||
09:30 5mTalk | Bug Tracking Process Smells In Practice SEIP - Software Engineering in Practice DOI Pre-print Media Attached | ||
09:35 5mTalk | Manas: Mining Software Repositories to Assist AutoML Technical Track Giang Nguyen Iowa State University, Md Johirul Islam Iowa State University, Rangeet Pan Iowa State University, USA, Hridesh Rajan Iowa State University DOI Pre-print Media Attached |