Task-Aware Reduction for Scalable LLM–Database Systems (LMD 2025)

Mon 10 - Thu 13 November 2025

Who

Marcus Barnes, Taher A. Ghaleb, Safwat Hassan

Track

LMD 2025

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 12 Nov 2025 15:30 - 16:00 at Room 3 - WKS-13: LLMs Meet Databases: Next-Generation Data Systems (Part 3) Chair(s): Anastasios Kementsietsidis

Abstract

Large Language Models (LLMs) are increasingly applied to data-intensive workflows, from database querying to developer observability. Yet the effectiveness of these systems is constrained by the volume, verbosity, and noise of real-world text-rich data such as logs, telemetry, and monitoring streams. Feeding such data directly into LLMs is costly, environmentally unsustainable, and often misaligned with task objectives. Parallel efforts in LLM efficiency have focused on model- or architecture-level optimizations, but the challenge of reducing upstream input verbosity remains under explored. In this paper, we argue for treating the token budget of an LLM as an attention budget and elevating task-aware text reduction as a first-class design principle for language–data systems. We position input-side reduction not as compression, but as attention allocation: prioritizing information most relevant to downstream tasks. We outline open research challenges for building benchmarks, designing adaptive reduction pipelines, and integrating token-budget–aware preprocessing into database and retrieval systems. Our vision is to channel scarce attention resources toward meaningful signals in noisy, data-intensive workflows, enabling scalable, accurate, and sustainable LLM–data integration.

Link to Preprint

https://arxiv.org/abs/2510.11813

Marcus Barnes

University of Toronto

Canada

Taher A. Ghaleb

Trent University

Canada

Safwat Hassan

University of Toronto, Canada

Canada

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 12 Nov
Displayed time zone: Eastern Time (US & Canada) change

15:00 - 16:30	WKS-13: LLMs Meet Databases: Next-Generation Data Systems (Part 3)LMD at Room 3 Chair(s): Anastasios Kementsietsidis Google DeepMind

15:00 30m Talk		LLM-Driven Event Log Generation from Forensic Cases: A Comparative Study of ChatGPT, Claude, and Gemini LMD Mirai Gendi , Periklis Andritsos University of Toronto, Canada
15:30 30m Talk		Task-Aware Reduction for Scalable LLM–Database Systems LMD Marcus Barnes University of Toronto, Taher A. Ghaleb Trent University, Safwat Hassan University of Toronto, Canada Pre-print
16:00 30m Talk		ScenarioBench: Trace-Grounded Compliance Evaluation for Text-to-SQL and RAG LMD Zahra Atf Ontario Tech University, Peter Lewis Ontario Tech University Pre-print