Skip to content

Instantly share code, notes, and snippets.

View geospatial-jeff's full-sized avatar

Jeff Albrecht geospatial-jeff

View GitHub Profile
@geospatial-jeff
geospatial-jeff / distributed_caching.md
Last active May 7, 2025 02:43
GDAL Distributed Caching

The process-local cache problem

The goal of this document is to describe GDAL's current caching mechanisms, how they fail at scale, and how distributed caching could help. We'll be using COG headers as an example as it is what I know the best; but this applies to other formats like Zarr.

Caching Mechanisms

GDAL maintains two caching mechanisms; the "VSI cache" used to cache recently accessed I/O through any VSI driver (ex. a COG header over http) and the "block cache" used to cache raster blocks. I am not familiar with the implemention details of either cache; the important thing for this discussion is the cache is local per process.

Failure Case

import abc
from dataclasses import asdict, dataclass
from typing import Dict, Optional, Sequence, Union
import affine
from rasterio.crs import CRS
from rasterio.io import MemoryFile
from rio_cogeo.cogeo import cog_translate
from rio_cogeo.profiles import JPEGProfile
@geospatial-jeff
geospatial-jeff / router_middleware.py
Created October 4, 2020 22:49
FastAPI middleware scoped to a router
from typing import Callable
from fastapi import FastAPI, APIRouter
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.routing import Match
app = FastAPI()
router = APIRouter()
@geospatial-jeff
geospatial-jeff / cog_tiler.py
Last active July 28, 2020 13:53
cog tiler with rio-tiler/fastapi
import abc
from dataclasses import dataclass
from io import BytesIO
from typing import Tuple
import numpy as np
from cogeo_mosaic.backends import MosaicBackend
from rio_tiler.errors import TileOutsideBounds
from rio_tiler.mosaic import mosaic_reader