Practitioners have limited evidence of the effectiveness of RE research techniques, as these are seldom applied to real-world, contextualized data. At the same time, researchers face challenges in accessing and sharing industrial requirements-related data, due to confidentiality concerns of the data owners. The RE Open Data Initiative aims to collect industry requirements data that will become accessible for researchers through a provided infrastructure. Researchers can then use these data to conduct research on challenges and datasets that are declared meaningful by practitioners. Submissions that make use of these data are welcome and encouraged in all tracks of RE’25.

The infrastructure supporting the RE open data data initiative is intended as a long-term setup; data stored in there will be accessible beyond the current instance of the RE conference.

Call for Data

The RE open data initiative aims to bridge the gap between research and industry by making industry requirements data accessible for researchers in a dedicated infrastructure. Practitioners and researchers are invited to provide requirements-related data sets for public use. Authors shall submit data sets in accessible formats like ReqIF (https://www.omg.org/reqif/), CSV, PNG, JPEG, GIF, PDF, XML (preferably with XSD schemas), database exports, open standards for word processing software and spreadsheets. We welcome all kinds of requirements formats and requirements related artifacts, including but not limited to:

  • Natural language requirements extracted from requirements management systems or contracts.

  • Epics, user stories, and acceptance criteria.

  • Process models, UML models, feature diagrams.

  • Use case specifications.

  • Change requests obtained from, e.g., issue tracking systems.

  • Any kind of early requirements, including user feedback, transcripts, and recordings.

  • User and stakeholder maps that describe stakeholders and their relationships.

  • Specification documents.

  • Trace-links and traced artifacts (e.g., requirements-to-code, requirements-to-architecture, requirements-to-tests).

Notes:

  • All data are welcome, including labelled/annotated data that can be used as a ground truth.

  • We also accept submissions of data already published elsewhere, for inclusion in the RE data track infrastructure. Specific instructions will be provided.

Additionally, use case and background information are welcome to support the business and software development context understanding. This optional material, which should be submitted in PDF format, includes but is not limited to:

  • Business segment/domain introduction.

  • Application scenarios describing how the data are intended to be used.

  • Challenges experienced by practitioners in using these requirement artifacts.

Data privacy and legal compliance

All accepted data sets will be made publicly available. This requires approval from the data owner for publication.

Submissions must comply with local and regional regulations.

Submissions must comply with local and regional data privacy regulations, such as GDPR. At a minimum, compliance with local regulations at the data’s origin is required.

The submitting author is responsible for compliance with legal requirements.

An open license shall be specified that permits potential researchers to make use of the data for (at least) non-commercial purposes. The submitting author is required to declare explicitly their preferred license. Help with choosing a license can be found here: https://ufal.github.io/public-license-selector/

Visibility of the data-owning company

The data owning party can choose between an anonymous or attributed publication of their data. If required by company or local regulations, a consent form for publication may have to be signed, where the data owning party is visible for the data initiative chairs.

Anonymization of Data

We recommend anonymizing or, pseudonymizing data as well as avoiding sensitive data whenever possible, particularly stakeholders and usernames, information related to unique selling points, or company secrets.

If available, domain market reports can provide insights of frequently offered features in a domain. Providing data around frequently offered features reduces the risk of touching company secrets.

Data sharing and infrastructure

The RE open data initiative will provide an infrastructure to make accepted datasets publicly available for research. The intention is to ensure long-term accessibility of the provided data. However, the data initiative owner(s) cannot be held liable in the event that access to the data becomes unavailable.

Details about the infrastructure will be announced in due time.

For any additional details about submitting and publishing the data, we encourage reaching out to the RE open data initiative chairs. We also welcome the whole RE community to think along and provide recommendations.