How Do Model Export Formats Impact the Development of ML-Enabled Systems? A Case Study on Model Integration (CAIN 2025 - Research and Experience Papers)

Who

Shreyas Kumar Parida, Ilias Gerostathopoulos, Justus Bogner

Track

CAIN 2025 Research and Experience Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 27 Apr 2025 14:00 - 14:15 at 208 - Architecting and Testing AI Systems Chair(s): Jan-Philipp Steghöfer

Abstract

Machine learning (ML) models have greatly improved their predictive and generative capabilities in recent years. They are therefore often integrated into ML-enabled systems to provide software functionality that would otherwise be impossible. This integration requires the selection of an appropriate ML model export format, for which many options are available. These formats are crucial for ensuring a seamless integration, and choosing a suboptimal one can negatively impact system development, e.g., via increased dependencies and higher maintenance costs. However, little evidence is available to guide practitioners during the export format selection.

We therefore aim to comprehensively evaluate various model export formats regarding their impact on the development of ML-enabled systems from an integration perspective. Based on the results of a preliminary questionnaire survey (n=17), we designed an extensive embedded case study with two ML-enabled systems in three versions with different technologies. We then analyzed the effect of five popular export formats, namely ONNX, Pickle, TensorFlow’s SavedModel, PyTorch’s TorchScript, and Joblib. In total, we studied 30 units of analysis (2 systems * 3 tech stacks * 5 formats) and collected data via structured field notes.

The holistic qualitative analysis of the results indicated that ONNX provided the most efficient integration across most cases, which shows its great flexibility and portability. SavedModel and TorchScript were very convenient to use in Python-based systems, but otherwise required workarounds (TorchScript more than SavedModel). The TensorFlow format also allowed the easy incorporation of preprocessing logic into a single file, which made it scalable for complex deep learning use cases. Pickle and Joblib were the most challenging to integrate, even in Python-based systems. Regarding technical support, all model export formats demonstrated excellent documentation quality and strong community support across platforms such as Stack Overflow and Reddit. Practitioners can use our findings to inform the selection of ML export formats suited to their context.

Link to Preprint

https://arxiv.org/abs/2502.00429

Shreyas Kumar Parida

ETH Zurich

Ilias Gerostathopoulos

Vrije Universiteit Amsterdam

Justus Bogner