Data is about Detail: An Empirical Investigation for Software Systems with NLP at CoreResearch Paper
Businesses continue to operate under increasingly complex demands such as ever-evolving regulatory landscape, personalization requirements from software apps, and stricter governance with respect to security and privacy. In response to these challenges, large enterprises have been emphasizing automation across a wide range, starting with business processes all the way to customer experience. In fact, over 76% of enterprises have prioritized Artificial Intelligence (AI) and Machine Learning (ML) over other Information technology (IT) activities in 2021. As AI continues to be adopted as a crucial aspect in software development, there is a need to focus on the predominant role that data plays in the development of software systems with AI at its core. The AI-centric industrial software systems need large amounts of training data. In our experience, this has introduced several challenges in relation to details pertinent to data. The details include aspects such as how we may select and process data so that the AI component is effective in meeting business goals of the software systems. In this paper, through an empirical study based on interviews with AI practitioners, we present current challenges that need to be addressed in ‘data requirements’ of Software Systems with NLP at the Core (SSNLPCore) and their impact. We further discuss techniques currently employed by practitioners while addressing the identified challenges.
Pre-print of the paper (cain22-camera-ready-latest.pdf) | 536KiB |