Sun 23 - Fri 28 October 2022 Montréal, Canada
Wed 26 Oct 2022 15:30 - 15:52 at A-3502.1 - AI for/with MDE I Chair(s): Lola Burgueño

In the quest to reuse modeling artifacts, academics and industry have proposed several model repositories over the last decade. Different storage and indexing techniques have been conceived to facilitate searching capabilities to help users find reusable artifacts that might fit the situation at hand. In this respect, machine learning (ML) techniques have been proposed as a way to automatically categorize and group large sets of modeling artifacts. This paper reports the results of a comparative study of different ML classification techniques employed to automatically label models stored in models repositories. We have built a framework to systematically compare different ML models (feed-forward neural networks, graph neural networks, K-nearest neighbors, support version machines, etc.), with different model encodings (TF-IDF, word embeddings, graphs and paths). We apply this framework to two datasets of about 5,000 Ecore and 5,000 UML models. We show that depending on the characteristics of the available datasets (e.g., presence of duplicates) and on the goals to be achieved, specific ML models and encodings perform better than others.

Wed 26 Oct

Displayed time zone: Eastern Time (US & Canada) change