Data annotation is foundational for the development of machine learning (ML) systems, yet its specification from a Requirements Engineering (RE) perspective is often informal. This poster presents a preliminary investigation of data annotation within ML development, addressing challenges, mitigation strategies, and the potential role of RE. Through a non-systematic multivocal literature review, we identify key challenges including data quality, scalability, subjectivity, ethics, and process management. We found that establishing annotation guidelines is a common mitigation strategy. We then propose incorporating RE principles by formally defining “data annotation requirements”—specifying what to label and why—and conceptualize a traceability chain from system requirements to annotated data. While this approach appears promising for enhancing ML model quality, its practical necessity warrants further empirical investigation.
Yi Peng University of Gothenburg and Chalmers University of Technology, Hina Saeeda Chalmers University Sweden, Hans-Martin Heyn University of Gothenburg & Chalmers University of Technology, Jennifer Horkoff Chalmers and the University of Gothenburg