Challenges in Practice: Building a Usable Library for Planetary-Scale Embeddings
This program is tentative and subject to change.
Remote sensing observations from satellites are critical for scientists to understand how our world is changing in the face of climate change, biodiversity loss, and desertification. However, working directly with this data is difficult. For any given satellite constellation, there are a multitude of processed products, data volume is considerable, and for optical imagery, users must contend with data sparsity due to cloud cover. This complexity creates a significant barrier for domain experts who are not specialists.
Pre-trained, self-supervised foundation models such as TESSERA (https://arxiv.org/abs/2506.20380) aim to solve this by offering pre-computed global embeddings. These rich embeddings can be used in-place of raw remote sensing data in a powerful “embedding-as-data” approach. For example, a single 128-dimensional TESSERA embedding for a 10-meter point on Earth can substitute for an entire year of optical and radar imagery, representing its temporal and spectral characteristics. While this could democratise access to advanced remote sensing-derived analytics, it also creates a new programming challenge: a lack of tools designed for this new approach.
In this talk we will focus on our lessons learnt from the development of geotessera (https://github.com/ucam-eo/geotessera), a library designed for this new embeddings-as-data approach. We will explore key design decisions that focus on both a high-level API for accessibility and tight integration with the existing scientific Python ecosystem. The core user workflow will be demonstrated, showing how our library enables a rapid classification task on this new data paradigm. By presenting this work as a case study, we aim to highlight the critical need for new programming systems research for high-dimensional geospatial embeddings and help build a stronger, more effective bridge between the programming and climate science communities.
The Earth generates hundreds of petabytes of satellite data annually, creating unprecedented opportunities to monitor planetary-scale environmental changes. Geospatial foundation models based on satellite data are now demonstrating capabilities from wildfire mapping to biodiversity monitoring that could transform climate adaptation and policy. However, these powerful models remain largely inaccessible or impractical for domain experts. Ecologists, urban planners, disaster managers, and environmental scientists typically lack the machine learning expertise required to leverage foundation model capabilities. This accessibility gap represents a barrier to addressing planetary-scale challenges. We analyze the systematic user experience failures that prevent domain expert adoption and outline a research agenda for making geospatial AI models more accessible to practitioners.
This also combines elements of another talk:
The Earth generates hundreds of petabytes of satellite data annually, creating unprecedented opportunities to monitor planetary-scale environmental changes. Geospatial foundation models based on satellite data are now demonstrating capabilities from wildfire mapping to biodiversity monitoring that could transform climate adaptation and policy. However, these powerful models remain largely inaccessible or impractical for domain experts. Ecologists, urban planners, disaster managers, and environmental scientists typically lack the machine learning expertise required to leverage foundation model capabilities. This accessibility gap represents a barrier to addressing planetary-scale challenges. We analyze the systematic user experience failures that prevent domain expert adoption and outline a research agenda for making geospatial AI models more accessible to practitioners.
This program is tentative and subject to change.
Mon 13 OctDisplayed time zone: Perth change
16:00 - 17:40 | |||
16:00 15mTalk | Challenges in Practice: Building a Usable Library for Planetary-Scale Embeddings PROPL Sadiq Jaffer University of Cambridge, Frank Feng University of Cambridge, Robin Young University of Cambridge, Srinivasan Keshav University of Cambridge, Anil Madhavapeddy University of Cambridge, UK, Robin Young University of Cambridge | ||
16:15 15mPaper | STACD: STAC Extension with DAGs for Geospatial Data and Algorithm Management PROPL Saharsh Laud Indian Institute Of Technology Delhi, Saurabh Joshi Indian Institute Of Technology Delhi, Tarun Mangla Indian Institute Of Technology Delhi, Abhilash Jindal IIT Delhi, India, Aaditeshwar Seth Indian Institute Of Technology Delhi | ||
16:30 15mTalk | Spatial Programming for Environmental Monitoring PROPL Josh Millar Imperial College London, Ryan Gibb University of Cambridge, Roy Ang University of Cambridge, Hamed Haddadi Imperial College London, Anil Madhavapeddy University of Cambridge, UK | ||
16:45 15mPaper | Yirgacheffe: a declarative approach to geospatial data PROPL Michael Dales University of Cambridge, UK, Alison Eyres University of Cambridge, Patrick Ferris University of Cambridge, UK, Anil Madhavapeddy University of Cambridge, UK, Francesca A. Ridley Newcastle University, Simon Tarr IUCN | ||
17:00 15mTalk | Large Language Models for computational climate analysis PROPL Jay Torry University of Cambridge | ||
17:15 15mTalk | Scaling the Urban Forest: An Integrated Framework for Managing Cities by Fusing Raster and Vector Data PROPL Andrés C. Zúñiga-González University of Cambridge, Anil Madhavapeddy University of Cambridge, UK, Ronita Bardhan University of Cambridge | ||
17:30 10mDay closing | Closing thoughts from the chairs PROPL Anil Madhavapeddy University of Cambridge, UK, KC Sivaramakrishnan IIT Madras and Tarides, Dominic Orchard University of Cambridge; University of Kent |
Please see https://icfp25.sigplan.org/attending/Information-for-Attendees for information on remote and in-person participation for this talk.