ESPN: Memory-Efficient Multi-vector Information Retrieval (ISMM 2024 - International Symposium on Memory Management)

Who

Susav Shrestha, Narasimha Reddy, Zongwang Li

Track

ISMM 2024

Time Zone

The program is currently displayed in (GMT+02:00) Windhoek.

Use conference time zone: (GMT+02:00) WindhoekSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 25 Jun 2024 16:40 - 17:00 at Iceland - ISMM: Session 4 - Potpourri Chair(s): Tony Hosking

Abstract

Recent advances in large language models have demonstrated remarkable effectiveness in information retrieval (IR) tasks. While many neural IR systems encode queries and documents into single-vector representations, multi-vector models elevate the retrieval quality by producing multi-vector representations and facilitating similarity searches at the granularity of individual tokens. However, these models significantly amplify memory requirements for retrieval indices by an order of magnitude. This escalation in index size renders the scalability of multi-vector IR models progressively challenging due to their substantial memory demands. We introduce Embedding from Storage Pipelined Network (ESPN) where we offload the entire re-ranking embedding tables to SSDs and reduce the memory requirements by (5-16x). We design a flexible software prefetcher applicable to any hierarchical clustering based search, achieving hit rates exceeding 90%. ESPN improves SSD based retrieval up to (6.4x) and end-to-end throughput by 68% to maintain near-memory levels of query latency even for large query batch sizes. The code is available at https://github.com/susavlsh10/ESPN-v1.

DOI

https://doi.org/10.1145/3652024.3665515

Susav Shrestha

Texas A&M University

United States

Narasimha Reddy

Texas A&M University

United States

Zongwang Li

Samsung

United States

Time Zone

The program is currently displayed in (GMT+02:00) Windhoek.

Use conference time zone: (GMT+02:00) WindhoekSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 25 Jun
Displayed time zone: Windhoek change

16:00 - 17:00	ISMM: Session 4 - PotpourriISMM 2024 at Iceland Chair(s): Tony Hosking Australian National University

16:00 20m Talk		SSRD: Shapes and Summaries for Race Detection in Concurrent Data StructuresRemote ISMM 2024 Xiaofan Sun University of California at Riverside, Rajiv Gupta University of California at Riverside DOI
16:20 20m Talk		A Heuristic for Periodic Memory Allocation with Little Fragmentation to Train Neural Networks ISMM 2024 Akifumi Imanishi Preferred Networks, Zijian Xu Preferred Networks DOI
16:40 20m Talk		ESPN: Memory-Efficient Multi-vector Information Retrieval ISMM 2024 Susav Shrestha Texas A&M University, Narasimha Reddy Texas A&M University, Zongwang Li Samsung DOI