Identification of Intra-Domain Ambiguity using Transformer-based Machine Learning
Recently, the application of neural word embeddings for detecting cross-domain ambiguities in software requirements has gained a significant attention from the requirements engineering (RE) community. Several approaches have been proposed in the literature for estimating the variation of meaning of commonly used terms in different domains. A major limitation of these techniques is that they are unable to identify and detect the terms that have been used in different contexts within the same application domain, i.e. intra-domain ambiguities or in a requirements document of an interdisciplinary project. We propose an approach based on the idea of bidirectional encoder representations from Transformers (BERT) and clustering for identifying such ambiguities. For every context in which a term has been used in the document, our approach returns a list of its most similar words and also provide some example sentences from the corpus highlighting its context-specific interpretation. We apply our approach to a computer science (CS) specific corpora and a multi-domain corpora which consists textual data from eight different application domains. Our experimental results show that this approach is very effective in identifying and detecting intra-domain ambiguities.