Write a Blog >>
ICSE 2023
Sun 14 - Sat 20 May 2023 Melbourne, Australia

This artifact contains the various tools and datasets used in our research paper. Because RQ2 and RQ3 in the paper are both based on the final set of 586 relevant posts, which were all fully manually analyzed, this artifact is mainly focused on RQ1 (i.e., overall statistics) in addition to the procedure for collecting the datasets underlying all of the RQs.

The material provided in this artifact can be divided into two parts. The first part is the code of various tools. And these tools are used for data collection, presented in Figure 3 as found in our research paper. The README file details how to use the tools in each step to filter posts. The second part provides the datasets generated from the data collection process. Because random sampling is used in the codebook and coding process, this part of our results may vary relative to what we obtained as used in our paper.

Our research paper aims to analyze the issues, challenges, and solutions on multilingual software development by analyzing the posts on StackOverflow through a largely manual empirical study. So, the materials that our artifact can provide are focused on offering the raw datasets and data collection/filtering procedures. Nevertheless, other researchers can reuse these processes through the tools for similar analyses based on a different starting set of posts or even for studies based on StackOverflow posts regarding other topics.

We provide the artifact on Zenodo [1].

[1] https://zenodo.org/record/7557752#.Y8xxT3ZKiUk