The Impact of Knowledge Distillation on the Performance and Energy Consumption of NLP Models (CAIN 2024 - Research and Experience Papers)

Who

Ye Yuan, Jiacheng Shi, Zongyao Zhang, Kaiwei Chen, Eloise Zhang, Vincenzo Stoico, Ivano Malavolta

Track

CAIN 2024 Research and Experience Papers

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 14 Apr 2024 12:10 - 12:25 at Pequeno Auditório - Architecting, Designing, Managing, and Modeling AI-Enabled Systems Chair(s): Nicolás Cardozo

Abstract

Context. Our research tackles a crucial challenge in Natural Language Processing (NLP). While models like BERT and GPT are powerful, they require substantial resources. Knowledge distillation can be employed as a technique to enhance their efficiency. Yet, we lack a clear understanding on their performance and energy consumption. This uncertainty is a major concern, especially in practical applications, where these models could strain resources and limit accessibility for developers with limited means. Our drive also comes from the pressing need for environmentally-friendly and sustainable applications in light of growing environmental worries. To address this, it is crucial to accurately measure their energy consumption.

Goal. This study aims to determine how Knowledge Distillation affects the energy consumption and performance of NLP models.

Method. To explore the impact of distillation techniques on NLP models, we benchmark BERT, Distilled-BERT, GPT-2, and Distilled-GPT-2 using three different tasks from three different categories selected from a third-party dataset. During the experiment, the energy consumption, CPU utilization, memory utilization, and inference time of the considered NLP models are measured and statistically analyzed.

Results. The study reveals notable differences between the original and the distilled version of the measured NLP models. Distilled versions generally consume less energy, with distilled GPT-2 having reduced CPU utilization. These results provide evidence and insights about the possible trade-offs in using distilled models for NLP.

Conclusion. The results of this study highlight the critical impact of model choice on performance and energy consumption metrics. Future research should consider a wider range of distilled models, diverse benchmarks, and deployment environments, as well as explore the ecological footprint of these models, particularly in the context of environmental sustainability.

Ye Yuan

Vrije Universiteit Amsterdam

Jiacheng Shi

Vrije Universiteit Amsterdam

Zongyao Zhang

Vrije Universiteit Amsterdam

Kaiwei Chen

Vrije Universiteit Amsterdam

Eloise Zhang

Vrije Universiteit Amsterdam

Vincenzo Stoico

Vrije Universiteit Amsterdam

Netherlands

Ivano Malavolta

Vrije Universiteit Amsterdam

Netherlands

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 14 Apr
Displayed time zone: Lisbon change

11:00 - 12:30	Architecting, Designing, Managing, and Modeling AI-Enabled SystemsIndustry Talks / Research and Experience Papers at Pequeno Auditório Chair(s): Nicolás Cardozo Universidad de los Andes

11:00 10m Talk		A Taxonomy of Foundation Model based Systems through the Lens of Software Architecture Research and Experience Papers Qinghua Lu Data61, CSIRO, Liming Zhu CSIRO’s Data61, Xiwei (Sherry) Xu Data61, CSIRO, Yue Liu CSIRO's Data61 & University of New South Wales, Zhenchang Xing CSIRO's Data61, Jon Whittle CSIRO's Data61 and Monash University
11:10 15m Talk		Investigating the Impact of Solid Design Principles on Machine Learning Code UnderstandingDistinguished paper Award Candidate Research and Experience Papers Raphael Cabral Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Maria Teresa Baldassarre Department of Computer Science, University of Bari , Hugo Villamizar Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Tatiana Escovedo Pontifical Catholic University of Rio de Janeiro, Helio Côrtes Vieira Lopes PUC-Rio Pre-print
11:25 10m Industry talk		KnowING Intelligent Document Classification: A Deep Dive into Microservices and Efficient Models at ING Industry Talks A: Andrew Rutherfoord CWI; University of Groningen, A: Gert Vermeer , Andrea Capiluppi Brunel University
11:35 15m Talk		An Exploratory Study of V-Model in Building ML-Enabled Software: A Systems Engineering PerspectiveDistinguished paper Award Candidate Research and Experience Papers Jie JW Wu University of British Columbia (UBC) Pre-print
11:50 10m Industry talk		Engineering Challenges in Industrial AI Industry Talks Martin Hollender ABB, Chaojun Xu ABB, Ruomu Tan ABB
12:00 10m Talk		Approach for Argumenting Safety on Basis of an Operational Design Domain Research and Experience Papers Gereon Weiss Fraunhofer IKS, Marc Zeller Siemens AG, Hannes Schoenhaar Siemens Corporate Technology, Christian Drabek Fraunhofer Institute for Cognitive Systems IKS, Andreas Kreutz Fraunhofer Institute for Cognitive Systems IKS
12:10 15m Talk		The Impact of Knowledge Distillation on the Performance and Energy Consumption of NLP Models Research and Experience Papers Ye Yuan Vrije Universiteit Amsterdam, Jiacheng Shi Vrije Universiteit Amsterdam, Zongyao Zhang Vrije Universiteit Amsterdam, Kaiwei Chen Vrije Universiteit Amsterdam, Eloise Zhang Vrije Universiteit Amsterdam, Vincenzo Stoico Vrije Universiteit Amsterdam, Ivano Malavolta Vrije Universiteit Amsterdam