Skip to content

Instantly share code, notes, and snippets.

@mdsumner
Last active October 27, 2024 07:36
Show Gist options
  • Save mdsumner/724f1db0bd0d211de6e3775486cd3df0 to your computer and use it in GitHub Desktop.
Save mdsumner/724f1db0bd0d211de6e3775486cd3df0 to your computer and use it in GitHub Desktop.

raster from multiple sources, disparate grid specs

This FlatGeoBuf file contains two polygon footprints, one is a northern hemisphere sea-ice concentration GeoTIFF image, the other southern (the values are scaled from 0-100% and stored with a palette, but we're just using the raw integer value here)

https://github.com/mdsumner/gti/raw/refs/heads/main/inst/extdata/tmerc_seaice.fgb

The source GeoTIFFs are in NSIDC south Polar Stereographic, and north Polar Stereographic as is standard. We use GTI format to register these as disparate data sources within a latent raster frame for rendering at a later time. We could store all the daily sea ice files this way, with an index on field 'date', so we could coalesce to a particular date (or set of dates, for examples like Sentinel scenes).

The south file is 316x332, and the north is 304x448.

Each GeoTIFF is stored in the 'location' field of the FlatGeoBuf, using the '/vsicurl/' protocol for streaming data from a url for GDAL.

The FlatGeoBuf itself sets Transverse Mercator (on longitude_0=147), and has metadata layer items that specify the resolution of the latent raster (RESX/RESY = 25000), as per the GTI specification.

Open as a raster with xarray, this is a "vector" file but we're declaring it a raster using the "GTI:" prefix, and we've set the minimal layer details for it to identify as a raster layer. At 25km resolution we get a grid of 487x1263.

import xarray
ds = xarray.open_dataset("GTI:https://github.com/mdsumner/gti/raw/refs/heads/main/inst/extdata/tmerc_seaice.fgb", engine = "rasterio")

ds.rio.crs
CRS.from_wkt('PROJCS["unknown",GEOGCS["unknown",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",147],PARAMETER["scale_factor",1],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]')

<xarray.Dataset> Size: 2MB
Dimensions:      (band: 1, x: 487, y: 1263)
Coordinates:
  * band         (band) int64 8B 1
  * x            (x) float64 4kB -6.183e+06 -6.158e+06 ... 5.942e+06 5.967e+06
  * y            (y) float64 10kB 1.594e+07 1.591e+07 ... -1.559e+07 -1.561e+07
    spatial_ref  int64 8B ...
Data variables:
    band_data    (band, y, x) float32 2MB ...

The creation script is at this repo: https://github.com/mdsumner/gti

This plot sets the very highest value (land,missing) and zero concentrations to NA:

image

The two sources used are these, south then north:

"/vsicurl/https://noaadata.apps.nsidc.org/NOAA/G02135/south/daily/geotiff/2024/10_Oct/S_20241017_concentration_v3.0.tif"

"/vsicurl/https://noaadata.apps.nsidc.org/NOAA/G02135/north/daily/geotiff/2024/10_Oct/N_20241017_concentration_v3.0.tif"
@mdsumner
Copy link
Author

we can of course override the default latent raster and specify a new one:

gdalwarp "GTI:https://github.com/mdsumner/gti/raw/refs/heads/main/inst/extdata/tmerc_seaice.fgb"  -t_srs "+proj=laea" laea.tif -tr 25000 25000 -te -1e7 -1e7 1e7 1e7

I would do that in R or python but I just making the point quickly, we now have two real materialized crs grids in one catalogue, rendered to a latent crs (tmerc), or to an new explicit one (laea)

image

@rbavery
Copy link

rbavery commented Oct 27, 2024

How did you install xarray + rasterio? installing the most recent versions of each with pip throws an error RasterioIOError: Failed to read TopoJSON data

import xarray
ds = xarray.open_dataset("GTI:https://github.com/mdsumner/gti/raw/refs/heads/main/inst/extdata/tmerc_seaice.fgb", engine = "rasterio")
``` --------------------------------------------------------------------------- KeyError Traceback (most recent call last) File ~/test-dask-on-ray/.venv/lib/python3.11/site-packages/xarray/backends/file_manager.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock) [210](https://file+.vscode-resource.vscode-cdn.net/Users/ryanavery/test-dask-on-ray/~/test-dask-on-ray/.venv/lib/python3.11/site-packages/xarray/backends/file_manager.py:210) try: --> [211](https://file+.vscode-resource.vscode-cdn.net/Users/ryanavery/test-dask-on-ray/~/test-dask-on-ray/.venv/lib/python3.11/site-packages/xarray/backends/file_manager.py:211) file = self._cache[self._key] [212](https://file+.vscode-resource.vscode-cdn.net/Users/ryanavery/test-dask-on-ray/~/test-dask-on-ray/.venv/lib/python3.11/site-packages/xarray/backends/file_manager.py:212) except KeyError:

File ~/test-dask-on-ray/.venv/lib/python3.11/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.getitem(self, key)
55 with self._lock:
---> 56 value = self._cache[key]
57 self._cache.move_to_end(key)

KeyError: [<function open at 0x115154ae0>, ('GTI:https://github.com/mdsumner/gti/raw/refs/heads/main/inst/extdata/tmerc_seaice.fgb',), 'r', (('sharing', False),), '11193bb9-3689-4f10-9857-5536ffc24c6f']

During handling of the above exception, another exception occurred:

CPLE_OpenFailedError Traceback (most recent call last)
File rasterio/_base.pyx:310, in rasterio._base.DatasetBase.init()

File rasterio/_base.pyx:221, in rasterio._base.open_dataset()

File rasterio/_err.pyx:359, in rasterio._err.exc_wrap_pointer()

CPLE_OpenFailedError: Failed to read TopoJSON data

During handling of the above exception, another exception occurred:

RasterioIOError Traceback (most recent call last)
Cell In[3], line 2
1 import xarray
----> 2 ds = xarray.open_dataset("GTI:https://github.com/mdsumner/gti/raw/refs/heads/main/inst/extdata/tmerc_seaice.fgb", engine = "rasterio")

File ~/test-dask-on-ray/.venv/lib/python3.11/site-packages/xarray/backends/api.py:671, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
659 decoders = _resolve_decoders_kwargs(
660 decode_cf,
661 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...)
667 decode_coords=decode_coords,
668 )
670 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 671 backend_ds = backend.open_dataset(
672 filename_or_obj,
673 drop_variables=drop_variables,
674 **decoders,
675 **kwargs,
676 )
677 ds = _dataset_from_backend_dataset(
678 backend_ds,
679 filename_or_obj,
(...)
689 **kwargs,
690 )
691 return ds

File ~/test-dask-on-ray/.venv/lib/python3.11/site-packages/rioxarray/xarray_plugin.py:58, in RasterioBackend.open_dataset(self, filename_or_obj, drop_variables, parse_coordinates, lock, masked, mask_and_scale, variable, group, default_name, decode_coords, decode_times, decode_timedelta, band_as_variable, open_kwargs)
56 if open_kwargs is None:
57 open_kwargs = {}
---> 58 rds = _io.open_rasterio(
59 filename_or_obj,
60 parse_coordinates=parse_coordinates,
61 cache=False,
62 lock=lock,
63 masked=masked,
64 mask_and_scale=mask_and_scale,
65 variable=variable,
66 group=group,
67 default_name=default_name,
68 decode_times=decode_times,
69 decode_timedelta=decode_timedelta,
70 band_as_variable=band_as_variable,
71 **open_kwargs,
72 )
73 if isinstance(rds, xarray.DataArray):
74 dataset = rds.to_dataset()

File ~/test-dask-on-ray/.venv/lib/python3.11/site-packages/rioxarray/_io.py:1128, in open_rasterio(filename, parse_coordinates, chunks, cache, lock, masked, mask_and_scale, variable, group, default_name, decode_times, decode_timedelta, band_as_variable, **open_kwargs)
1126 else:
1127 manager = URIManager(file_opener, filename, mode="r", kwargs=open_kwargs)
-> 1128 riods = manager.acquire()
1129 captured_warnings = rio_warnings.copy()
1131 # raise the NotGeoreferencedWarning if applicable

File ~/test-dask-on-ray/.venv/lib/python3.11/site-packages/xarray/backends/file_manager.py:193, in CachingFileManager.acquire(self, needs_lock)
178 def acquire(self, needs_lock=True):
179 """Acquire a file object from the manager.
180
181 A new file is only opened if it has expired from the
(...)
191 An open file object, as returned by opener(*args, **kwargs).
192 """
--> 193 file, _ = self._acquire_with_cache_info(needs_lock)
194 return file

File ~/test-dask-on-ray/.venv/lib/python3.11/site-packages/xarray/backends/file_manager.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
215 kwargs = kwargs.copy()
216 kwargs["mode"] = self._mode
--> 217 file = self._opener(*self._args, **kwargs)
218 if self._mode == "w":
219 # ensure file doesn't get overridden when opened again
220 self._mode = "a"

File ~/test-dask-on-ray/.venv/lib/python3.11/site-packages/rasterio/env.py:463, in ensure_env_with_credentials..wrapper(*args, **kwds)
460 session = DummySession()
462 with env_ctor(session=session):
--> 463 return f(*args, **kwds)

File ~/test-dask-on-ray/.venv/lib/python3.11/site-packages/rasterio/init.py:355, in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, opener, **kwargs)
352 path = _parse_path(raw_dataset_path)
354 if mode == "r":
--> 355 dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
356 elif mode == "r+":
357 dataset = get_writer_for_path(path, driver=driver)(
358 path, mode, driver=driver, sharing=sharing, **kwargs
359 )

File rasterio/_base.pyx:312, in rasterio._base.DatasetBase.init()

RasterioIOError: Failed to read TopoJSON data
'2024.10.0'

@mdsumner
Copy link
Author

I use docker:

https://github.com/mdsumner/gdal-builds/pkgs/container/gdal-builds

rocker-gdal-dev-python

It doesn't have to all be so bleeding edge, but it's easier to keep up that way. GTI is GDAL 3.9, so it's very new

@rbavery
Copy link

rbavery commented Oct 27, 2024

ok gotcha thanks, I'll give that a shot. It might be that the newest rasterio is using an older GDAL and overriding my python GDAL 3.9 install? Not sure.

@mdsumner
Copy link
Author

Maybe, I went through packages to make sure they use my GDAL, PROJ, and GEOS but it's all bit chaotic now, sometimes I have to force reinstall pyproj and haven't got back to clean up yet

@mdsumner
Copy link
Author

I think Pangeo docker has very recent GDAL so that might be a better way

@rbavery
Copy link

rbavery commented Oct 27, 2024

Thanks for the tip, in a fresh conda environment with only xarray and rioxarray from the conda-forge channel this works.

@mdsumner
Copy link
Author

Note that the warper doesn't need GTI you can just give it multiple sources, that's how the API works (I'm not really across how rasterio uses it, yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment