"Looks Good To Me ;-)": Assessing Sentiment Analysis Tools for Pull Request Discussions (EASE 2024 - Research Papers)

Who

Daniel Coutinho, Luísa Cito, Maria Vitória Lima, Beatriz Arantes, Juliana Alves Pereira, Johny Arriel, João Godinho, Vinícius Martins, Paulo Vítor C. F. Libório, Leonardo Leite, Alessandro Garcia, Wesley Assunção, Igor Steinmacher, Augusto Baffa, Baldoino Fonseca

Track

EASE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 19 Jun 2024 11:15 - 11:30 at Room Capri - Human Aspects Chair(s): Guilherme Horta Travassos

Abstract

Modern software development relies on cloud-based collaborative platforms (e.g., GitHub and GitLab), characterized by remote, distributed, and asynchronous collaboration. In these platforms, developers often employ a pull-based development approach, proposing changes via pull requests and engaging in communication via message exchanges. Since communication is key for software development, studies have linked different types of sentiments embedded in the communication to their effects on software projects, such as bug-inducing commits or the non-acceptance of pull requests. In this context, sentiment analysis tools are paramount to detect the sentiment of developers’ messages and prevent potentially harmful impact. Unfortunately, the existing state-of-the-art tools vary in terms of the nature of their data collection and labeling processes. Yet, there is no comprehensive study comparing the performance and generalizability of existing tools utilizing a dataset that was designed and systematically curated to this end, and in this specific context. Therefore, in this study, we design a methodology to assess the effectiveness of existing sentiment analysis tools in the context of pull request discussions. For that, we created the PRemo dataset that contains ≈1.8K manually labeled messages from 36 software projects. The messages were labeled by 19 experts (neuroscientists and software engineers), using a novel and systematic manual classification process designed to reduce subjectivity. By applying these existing tools to our dataset, we observed that while some tools perform acceptably, their performance is far from ideal, especially when classifying negative messages. This is interesting since negative sentiment is often related to a critical or unfavorable opinion. We also observed that some messages have characteristics that can make them harder to classify, causing disagreements between the experts and possible misclassifications by the tools, requiring more attention from researchers. Our contributions include valuable resources to pave the way to develop robust and mature sentiment analysis tools that capture/anticipate potential problems during software development.

Daniel Coutinho

Pontifical Catholic University of Rio de Janeiro (PUC-Rio)

Luísa Cito

Pontifical Catholic University of Rio de Janeiro

Brazil

Maria Vitória Lima

Pontifical Catholic University of Rio de Janeiro (PUC-Rio)

Brazil

Beatriz Arantes

Pontifical Catholic University of Rio de Janeiro (PUC-Rio)

Brazil

Juliana Alves Pereira

PUC-Rio

Brazil

Johny Arriel

PUC-Rio

Brazil

João Godinho

Pontifical Catholic University of Rio de Janeiro (PUC-Rio)

Brazil

Vinícius Martins

PUC-Rio

Brazil

Paulo Vítor C. F. Libório

Pontifical Catholic University of Rio de Janeiro (PUC-Rio)

Brazil

Leonardo Leite

Federal University of Alagoas (UFAL)

Brazil

Alessandro Garcia

Pontifical Catholic University of Rio de Janeiro (PUC-Rio)

Brazil

Wesley Assunção

North Carolina State University

United States

Igor Steinmacher

Northern Arizona University

United States

Augusto Baffa

PUC-Rio

Brazil

Baldoino Fonseca

Federal University of Alagoas

Brazil

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 19 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

11:00 - 12:20	Human AspectsShort Papers, Vision and Emerging Results / Research Papers / Industry at Room Capri Chair(s): Guilherme Horta Travassos Federal University of Rio de Janeiro

11:00 15m Talk		Trustworthy AI in practice: an analysis of practitioners' needs and challenges Research Papers Maria Teresa Baldassarre Department of Computer Science, University of Bari , Domenico Gigante SER&Practices and University of Bari, Azzurra Ragone University of Bari, Sara Tibidò Scuola IMT Alti Studi Lucca, Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio)
11:15 15m Talk		"Looks Good To Me ;-)": Assessing Sentiment Analysis Tools for Pull Request Discussions Research Papers Daniel Coutinho Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Luísa Cito Pontifical Catholic University of Rio de Janeiro, Maria Vitória Lima Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Beatriz Arantes Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Juliana Alves Pereira PUC-Rio, Johny Arriel PUC-Rio, João Godinho Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Vinícius Martins PUC-Rio, Paulo Vítor C. F. Libório Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Leonardo Leite Federal University of Alagoas (UFAL), Alessandro Garcia Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Wesley Assunção North Carolina State University, Igor Steinmacher Northern Arizona University, Augusto Baffa PUC-Rio, Baldoino Fonseca Federal University of Alagoas
11:30 15m Talk		Motivation Research Using Labeling Functions Research Papers Idan Amit The Hebrew University, Dror Feitelson Hebrew University DOI Pre-print
11:45 15m Talk		Insight AI Risk Detection Model - Vulnerable People Emotional Situation Support Industry Diego Gosmar Open Voice Trustmark Ethical use task force Linux Foundation AI & DATA, Elena Peretto Fundaci— Ajuda i Esperanچa, Oita Coleman Open Voice Trustmark Ethical use task force Linux Foundation AI & DATA
12:00 10m Talk		On the Use of ChatGPT for Code Review Short Papers, Vision and Emerging Results Miku Watanabe Nara College, National Institute of Technology/Nara Institute of Science and Technology, Yutaro Kashiwa Nara Institute of Science and Technology, Bin Lin Radboud University, Toshiki Hirao , Ken'Ichi Yamaguchi , Hajimu Iida Nara Institute of Science and Technology Pre-print
12:10 10m Talk		What You Use is What You Get: Unforced Errors in Studying Cultural Aspects in Agile Software Development Short Papers, Vision and Emerging Results Michael Neumann University of Applied Sciences & Arts Hannover, Klaus Schmid University of Hildesheim, Lars Baumann DOI Pre-print