Mobile app reviews are a large-scale data source for software improvements. A key task in this context is effectively extracting requirements from app reviews to analyze the users’ needs and support the software’s evolution. Recent studies show that existing methods fail at this task since app reviews usually contain informal language, grammatical and spelling errors, and a large amount of irrelevant information that might not have direct practical value for developers. To address this, we propose a novel reformulation of requirements extraction as a Named Entity Recognition (NER) task based on the sequence-to-sequence (Seq2seq) generation approach. With this aim, we propose a Seq2seq framework, incorporating a BiLSTM encoder and an LSTM decoder, enhanced with a self-attention mechanism, GloVe embeddings, and a CRF model. We evaluated our framework on two datasets: a manually annotated set of 1,000 reviews (Dataset 1) and a crowdsourced set of 23,816 reviews (Dataset 2). A statistical analysis of the preliminary results showed that our framework outperforms existing state-of-the-art methods with an F1 score of 0.47 on Dataset 1 and 0.96 on Dataset 2.
Quim Motger Universitat Politècnica de Catalunya, Marc Oriol Universitat Politècnica de Catalunya, Max Tiessler Universitat Politècnica de Catalunya, Xavier Franch Universitat Politècnica de Catalunya, Jordi Marco Universitat Politècnica de Catalunya