Skip to content

Instantly share code, notes, and snippets.

@bennyistanto
Last active May 25, 2025 07:46
Show Gist options
  • Save bennyistanto/98038d0a372874f380c4126f0c1b580a to your computer and use it in GitHub Desktop.
Save bennyistanto/98038d0a372874f380c4126f0c1b580a to your computer and use it in GitHub Desktop.
Landcover annual transitions in hectares by country

Landcover Zonal Statistics & Transition Analysis

This repository provides two complementary workflows for deriving landcover area summaries and full transition matrices from MODIS IGBP or ESA CCI landcover datasets. Both workflows produce per‑year class area CSVs and comprehensive inter‑annual transition tables, then optionally split those tables by ISO3 code.


🔧 Requirements

  • Python 3.8+
  • geopandas
  • rasterstats
  • pandas
  • numpy
  • rasterio
  • xarray (only for NetCDF inputs)
  • tqdm

Install via conda or pip:

conda install geopandas rasterio rasterstats xarray pandas numpy tqdm
# or
pip install geopandas rasterio rasterstats xarray pandas numpy tqdm

📝 Notebook 1: zonal_histogram_annualtransitions_landcover.ipynb

Purpose

  1. Compute per‑year class areas: For each annual raster (GeoTIFF/NetCDF), run a zonal histogram by your admin boundaries to count pixels per IGBP/LCCS class, convert counts to hectares, and write both per‑year and combined CSV outputs.
  2. Build full transition matrices: For every pair of consecutive years, encode each pixel’s class change as 100*source + target, perform zonal histograms on that code, and output a complete source→target matrix (including zeros) per year.

Key Steps

  1. Per‑pixel transition code:

    trans = arr_prev.astype(np.int32)*100 + arr_curr.astype(np.int32)

    Assigns each pixel a unique integer for its (source, target) classes.

  2. Zonal counting:

    stats = zonal_stats(
      zones, trans,
      affine=transform,
      categorical=True,
      all_touched=True,
      nodata=-999
    )

    Buckets pixels by transition code per zone, returning {code: count} dicts.

  3. Area conversion:

    hectares = count * pixel_area_ha

    Converts pixel counts into hectares (25 ha for MODIS; adjust for other resolutions).

  4. Memory management: Frequent gc.collect() calls free Python memory between years to avoid OOM.


📝 Notebook 2: landcover_transitions_iso3.ipynb

Purpose

Splits the master transition table by ISO3 code and aggregates multiple polygons:

sub = (
  df[df.iso_3 == iso]
    .groupby(['year','source','target'], as_index=False)['hectares']
    .sum()
    .rename(columns={'hectares':'value'})
)
sub.to_csv(f'transitions_{iso}.csv', index=False)

This ensures each country’s output contains one record per (year, source, target) with summed hectare values, even if that country has multiple boundary parts.

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "4c699a5e-4b2f-425c-bfba-fd4df50350e9",
"metadata": {},
"source": [
"# Annual Landcover Transitions by Country"
]
},
{
"cell_type": "markdown",
"id": "130fcb25-af15-4636-a2c6-8dc7c6d5469c",
"metadata": {},
"source": [
"This two‐part workflow first calculates annual land‐cover areas by class—reading each MODIS IGBP GeoTIFF or ESACCI LCCS NetCDF, performing zonal histograms against the admin boundaries, converting pixel counts to hectares, and saving both per‐year and combined CSVs—while skipping any years we’ve already processed. \n",
"\n",
"In the second part, it builds a complete transition matrix for every possible “source→target” class pair (including zeros) by reading pairs of consecutive yearly rasters, encoding transitions as 100*src + tgt, running zonal histograms on that code, and writing a full matrix per year (again skipping existing outputs), before finally concatenating all years into one master transitions CSV. Garbage collection calls throughout help keep memory use under control.\n"
]
},
{
"cell_type": "markdown",
"id": "be015dfb-561d-4729-89a6-88a1bba2c0f5",
"metadata": {},
"source": [
"## ESACCI Landcover\n",
"\n",
"Following the LCCS classification below, and available annually from 1992 to 2022.\n",
"\n",
"| Value | Label\t| Hex | Color |\n",
"|-----|-----|-----|-----|\n",
"| 0 | No Data\t| `#000000` | <span style=\"background-color: #000000; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 10 | Cropland, rainfed\t| `#ffff64` | <span style=\"background-color: #ffff64; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 11 | Cropland, rainfed, herbaceous cover | `#ffff64` | <span style=\"background-color: #ffff64; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 12 | Cropland, rainfed, tree, or shrub cover | `#ffff00` | <span style=\"background-color: #ffff00; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 20 | Cropland, irrigated or post-flooding\t| `#aaf0f0` | <span style=\"background-color: #aaf0f0; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 30 | Mosaic cropland (>50%) / natural vegetation (tree, shrub, herbaceous cover) (<50%)\t| `#dcf064` | <span style=\"background-color: #dcf064; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 40 | Mosaic natural vegetation (tree, shrub, herbaceous cover) (>50%) / cropland (<50%)\t| `#c8c864` | <span style=\"background-color: #c8c864; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 50 | Tree cover, broadleaved, evergreen, closed to open (>15%)\t| `#006400` | <span style=\"background-color: #006400; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 60 | Tree cover, broadleaved, deciduous, closed to open (>15%)\t| `#00a000` | <span style=\"background-color: #00a000; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 61 | Tree cover, broadleaved, deciduous, closed (>40%) | `#00a000` | <span style=\"background-color: #00a000; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 62 | Tree cover, broadleaved, deciduous, open (15-40%) | `#aac800` | <span style=\"background-color: #aac800; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 70 | Tree cover, needleleaved, evergreen, closed to open (>15%)\t| `#003c00` | <span style=\"background-color: #003c00; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 71 | Tree cover, needleleaved, evergreen, closed (>40%) | `#003c00` | <span style=\"background-color: #003c00; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 72 | Tree cover, needleleaved, evergreen, open (15-40%) | `#005000` | <span style=\"background-color: #005000; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 80 | Tree cover, needleleaved, deciduous, closed to open (>15%)\t| `#285000` | <span style=\"background-color: #285000; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 81 | Tree cover, needleleaved, deciduous, closed (>40%) | `#285000` | <span style=\"background-color: #285000; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 82 | Tree cover, needleleaved, deciduous, open (15-40%) | `#286400` | <span style=\"background-color: #286400; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 90 | Tree cover, mixed leaf type (broadleaved and needleleaved)\t| `#788200` | <span style=\"background-color: #788200; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 100 | Mosaic tree and shrub (>50%) / herbaceous cover (<50%)\t| `#8ca000` | <span style=\"background-color: #8ca000; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 110 | Mosaic herbaceous cover (>50%) / tree and shrub (<50%)\t| `#be9600` | <span style=\"background-color: #be9600; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 120 | Shrubland\t| `#966400` | <span style=\"background-color: #966400; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 121 | Evergreen shrubland | `#966400` | <span style=\"background-color: #966400; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 122 | Deciduous shrubland | `#966400` | <span style=\"background-color: #966400; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 130 | Grassland\t| `#ffb432` | <span style=\"background-color: #ffb432; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 140 | Lichens and mosses\t| `#ffdcd2` | <span style=\"background-color: #ffdcd2; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 150 | Sparse vegetation (tree, shrub, herbaceous cover) (<15%)\t| `#ffebaf` | <span style=\"background-color: #ffebaf; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 151 | Sparse tree (<15%) | `#ffc864` | <span style=\"background-color: #ffc864; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 152 | Sparse shrub (<15%) | `#ffd278` | <span style=\"background-color: #ffd278; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 153 | Sparse herbaceous cover (<15%) | `#ffebaf` | <span style=\"background-color: #ffebaf; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 160 | Tree cover, flooded, fresh or brackish water\t| `#00785a` | <span style=\"background-color: #00785a; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 170 | Tree cover, flooded, saline water\t| `#009678` | <span style=\"background-color: #009678; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 180 | Shrub or herbaceous cover, flooded, fresh/saline/brackish water\t| `#00dc82` | <span style=\"background-color: #00dc82; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 190 | Urban areas\t| `#c31400` | <span style=\"background-color: #c31400; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 200 | Bare areas\t| `#fff5d7` | <span style=\"background-color: #fff5d7; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 201 | Consolidated bare areas | `#dcdcdc` | <span style=\"background-color: #dcdcdc; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 202 | Unconsolidated bare areas | `#fff5d7` | <span style=\"background-color: #fff5d7; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 210 | Water bodies\t| `#0046c8` | <span style=\"background-color: #0046c8; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"| 220 | Permanent snow and ice\t| `#ffffff` | <span style=\"background-color: #ffffff; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> |\n",
"\n",
"Reference: https://cds.climate.copernicus.eu/datasets/satellite-land-cover?tab=overview\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "246e7e86-bfb2-4182-9295-d1de850972d2",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Yearly hectares: 100%|██████████████████████████████████████████████████████████████| 31/31 [9:07:30<00:00, 1059.69s/it]\n",
"Transitions: 100%|██████████████████████████████████████████████████████████████████| 30/30 [8:33:08<00:00, 1026.27s/it]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done: per‑year and full‑matrix transitions CSVs in /mnt/g/temp/esaccilc/zonal_stats\n"
]
}
],
"source": [
"import os\n",
"import gc\n",
"import numpy as np\n",
"import geopandas as gpd\n",
"import pandas as pd\n",
"import xarray as xr\n",
"import rasterio\n",
"from rasterio.transform import from_origin\n",
"from rasterstats import zonal_stats\n",
"from tqdm import tqdm\n",
"import tempfile\n",
"\n",
"# ── PARAMETERS ────────────────────────────────────────────────────────────────\n",
"input_folder = \"/mnt/g/temp/esaccilc\"\n",
"zone_shapefile = \"/mnt/g/temp/boundaries/adm0/adm0.shp\"\n",
"output_folder = \"/mnt/g/temp/esaccilc/zonal_stats\"\n",
"pixel_area_ha = 9 # 300m resolution → 0.09 km² = 9 ha per pixel\n",
"# Expected LCCS class codes\n",
"expected_classes = [10,11,12,20,30,40,50,60,61,62,\n",
" 70,71,72,80,81,82,90,100,110,120,\n",
" 121,122,130,140,150,151,152,153,160,170,\n",
" 180,190,200,201,202,210,220]\n",
"# ───────────────────────────────────────────────────────────────────────────────\n",
"\n",
"os.makedirs(output_folder, exist_ok=True)\n",
"gc.enable()\n",
"\n",
"# 1) load zones\n",
"zones = gpd.read_file(zone_shapefile)\n",
"keep = ['adm0_src','adm0_name','adm0_name1','iso_3','iso_3_grp','geometry']\n",
"zones = zones[keep].sort_values('adm0_src').reset_index(drop=True)\n",
"\n",
"# 2) list and sort NetCDF files\n",
"nc_files = sorted(\n",
" f for f in os.listdir(input_folder)\n",
" if f.startswith('P') and (f.endswith('.nc') or f.endswith('.nc4'))\n",
")\n",
"\n",
"# ── PART 1: per-year class area ────────────────────────────────────────────────\n",
"annual_stats = []\n",
"for fname in tqdm(nc_files, desc=\"Yearly hectares\"):\n",
" year = fname[1:5]\n",
" out_csv = os.path.join(output_folder, f\"esaccilc_ha_{year}.csv\")\n",
"\n",
" if os.path.exists(out_csv):\n",
" annual_stats.append(pd.read_csv(out_csv))\n",
" continue\n",
"\n",
" ds = xr.open_dataset(os.path.join(input_folder, fname))\n",
" arr = ds['lccs_class'].squeeze().values\n",
" lats = ds['lat'].values\n",
" lons = ds['lon'].values\n",
" ds.close()\n",
"\n",
" res_lat = abs(lats[1] - lats[0])\n",
" res_lon = abs(lons[1] - lons[0])\n",
" transform = from_origin(lons.min(), lats.max(), res_lon, res_lat)\n",
"\n",
" with tempfile.NamedTemporaryFile(suffix='.tif') as tmp:\n",
" with rasterio.open(\n",
" tmp.name, 'w', driver='GTiff',\n",
" height=arr.shape[0], width=arr.shape[1], count=1,\n",
" dtype=arr.dtype, crs='+proj=longlat +datum=WGS84',\n",
" transform=transform\n",
" ) as dst:\n",
" dst.write(arr, 1)\n",
" stats = zonal_stats(zones, tmp.name, categorical=True, all_touched=True, nodata=0)\n",
"\n",
" df = pd.DataFrame(stats).fillna(0).astype(int)\n",
"\n",
" # Ensure all classes present\n",
" for cls in expected_classes:\n",
" if cls not in df.columns:\n",
" df[cls] = 0\n",
" df = df[sorted(df.columns)]\n",
" df *= pixel_area_ha\n",
"\n",
" out = zones.drop(columns='geometry').copy()\n",
" out['year'] = int(year)\n",
" out = out.reset_index().rename(columns={'index': 'id'})\n",
" for cls in expected_classes:\n",
" out[f'class_{cls}_ha'] = df[cls]\n",
"\n",
" out.to_csv(out_csv, index=False)\n",
" annual_stats.append(out)\n",
"\n",
" del ds, arr, stats, df, out\n",
" gc.collect()\n",
"\n",
"combined_annual = pd.concat(annual_stats, ignore_index=True)\n",
"combined_annual.to_csv(\n",
" os.path.join(output_folder, \"wld_phy_esaccilc_lccs_ha_1992_2022.csv\"),\n",
" index=False\n",
")\n",
"\n",
"del annual_stats, combined_annual\n",
"gc.collect()\n",
"\n",
"# ── PART 2: inter‑annual transitions (complete matrix) ─────────────────────────\n",
"for prev, curr in tqdm(zip(nc_files[:-1], nc_files[1:]), desc=\"Transitions\", total=len(nc_files)-1):\n",
" year = int(curr[1:5])\n",
" out_csv = os.path.join(output_folder, f\"esaccilc_trans_{year}.csv\")\n",
"\n",
" if os.path.exists(out_csv):\n",
" continue\n",
"\n",
" ds_prev = xr.open_dataset(os.path.join(input_folder, prev))\n",
" ds_curr = xr.open_dataset(os.path.join(input_folder, curr))\n",
" arr_p = ds_prev['lccs_class'].squeeze().values.astype(np.int32)\n",
" arr_c = ds_curr['lccs_class'].squeeze().values.astype(np.int32)\n",
" lats = ds_prev['lat'].values\n",
" lons = ds_prev['lon'].values\n",
" ds_prev.close(); ds_curr.close()\n",
"\n",
" res_lat = abs(lats[1] - lats[0])\n",
" res_lon = abs(lons[1] - lons[0])\n",
" transform = from_origin(lons.min(), lats.max(), res_lon, res_lat)\n",
" trans = arr_p * 100 + arr_c\n",
"\n",
" stats = zonal_stats(\n",
" zones, trans,\n",
" affine=transform,\n",
" categorical=True,\n",
" all_touched=True,\n",
" nodata=-999\n",
" )\n",
"\n",
" records = []\n",
" for idx, zone_stats in enumerate(stats):\n",
" info = zones.iloc[idx]\n",
" for src in expected_classes:\n",
" for tgt in expected_classes:\n",
" code = src * 100 + tgt\n",
" count = zone_stats.get(code, 0)\n",
" records.append({\n",
" 'id': idx + 1,\n",
" 'adm0_src': info.adm0_src,\n",
" 'adm0_name': info.adm0_name,\n",
" 'iso_3': info.iso_3,\n",
" 'year': year,\n",
" 'source': src,\n",
" 'target': tgt,\n",
" 'hectares': count * pixel_area_ha\n",
" })\n",
"\n",
" pd.DataFrame(records).to_csv(out_csv, index=False)\n",
" del arr_p, arr_c, stats, records, trans\n",
" gc.collect()\n",
"\n",
"trans_files = sorted(\n",
" f for f in os.listdir(output_folder)\n",
" if f.startswith(\"esaccilc_trans_\") and f.endswith(\".csv\")\n",
")\n",
"all_trans = pd.concat(\n",
" (pd.read_csv(os.path.join(output_folder, f)) for f in trans_files),\n",
" ignore_index=True\n",
")\n",
"all_trans.to_csv(\n",
" os.path.join(output_folder, \"wld_phy_esaccilc_lccs_transitions_1993_2022.csv\"),\n",
" index=False\n",
")\n",
"\n",
"print(\"Done: per‑year and full‑matrix transitions CSVs in\", output_folder)\n"
]
},
{
"cell_type": "markdown",
"id": "68f02cce-6556-4bdc-9e33-684fc5644689",
"metadata": {},
"source": [
"## MCD12Q1 Landcover\n",
"\n",
"Following the IGBP classification below, and available annually from 2001 - 2023\n",
"\n",
"| Value | Hex | Color | Description |\n",
"|-----|-----|-----|-----|\n",
"| 1 | `#05450a` | <span style=\"background-color: #05450a; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Evergreen Needleleaf Forests: dominated by evergreen conifer trees (canopy >2m). Tree cover >60%. |\n",
"| 2 | `#086a10` | <span style=\"background-color: #086a10; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Evergreen Broadleaf Forests: dominated by evergreen broadleaf and palmate trees (canopy >2m). Tree cover >60%. |\n",
"| 3 | `#54a708` | <span style=\"background-color: #54a708; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Deciduous Needleleaf Forests: dominated by deciduous needleleaf (larch) trees (canopy >2m). Tree cover >60%. |\n",
"| 4 | `#78d203` | <span style=\"background-color: #78d203; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Deciduous Broadleaf Forests: dominated by deciduous broadleaf trees (canopy >2m). Tree cover >60%. |\n",
"| 5 | `#009900` | <span style=\"background-color: #009900; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Mixed Forests: dominated by neither deciduous nor evergreen (40-60% of each) tree type (canopy >2m). Tree cover >60%. |\n",
"| 6 | `#c6b044` | <span style=\"background-color: #c6b044; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Closed Shrublands: dominated by woody perennials (1-2m height) >60% cover. |\n",
"| 7 | `#dcd159` | <span style=\"background-color: #dcd159; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Open Shrublands: dominated by woody perennials (1-2m height) 10-60% cover. |\n",
"| 8 | `#dade48` | <span style=\"background-color: #dade48; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Woody Savannas: tree cover 30-60% (canopy >2m). |\n",
"| 9 | `#fbff13` | <span style=\"background-color: #fbff13; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Savannas: tree cover 10-30% (canopy >2m). |\n",
"| 10 | `#b6ff05` | <span style=\"background-color: #b6ff05; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Grasslands: dominated by herbaceous annuals (<2m). |\n",
"| 11 | `#27ff87` | <span style=\"background-color: #27ff87; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Permanent Wetlands: permanently inundated lands with 30-60% water cover and >10% vegetated cover. |\n",
"| 12 | `#c24f44` | <span style=\"background-color: #c24f44; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Croplands: at least 60% of area is cultivated cropland. |\n",
"| 13 | `#a5a5a5` | <span style=\"background-color: #a5a5a5; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Urban and Built-up Lands: at least 30% impervious surface area including building materials, asphalt and vehicles. |\n",
"| 14 | `#ff6d4c` | <span style=\"background-color: #ff6d4c; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Cropland/Natural Vegetation Mosaics: mosaics of small-scale cultivation 40-60% with natural tree, shrub, or herbaceous vegetation. |\n",
"| 15 | `#69fff8` | <span style=\"background-color: #69fff8; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Permanent Snow and Ice: at least 60% of area is covered by snow and ice for at least 10 months of the year. |\n",
"| 16 | `#f9ffa4` | <span style=\"background-color: #f9ffa4; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Barren: at least 60% of area is non-vegetated barren (sand, rock, soil) areas with less than 10% vegetation. |\n",
"| 17 | `#1c0dff` | <span style=\"background-color: #1c0dff; padding: 0 10px;\">&nbsp;&nbsp;&nbsp;</span> | Water Bodies: at least 60% of area is covered by permanent water bodies. |\n",
"\n",
"Reference: https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MCD12Q1#description\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "1b6e48f9-7810-4f3a-98bd-326fbd666f13",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Yearly hectares: 100%|███████████████████████████████████████████████████████████████| 23/23 [5:44:19<00:00, 898.23s/it]\n",
"Full transitions: 100%|██████████████████████████████████████████████████████████████| 22/22 [3:55:49<00:00, 643.15s/it]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done: per-year and full-matrix transitions written to /mnt/g/temp/mcd12q1/zonal_stats\n"
]
}
],
"source": [
"import os\n",
"import gc\n",
"import numpy as np\n",
"import geopandas as gpd\n",
"import pandas as pd\n",
"import rasterio\n",
"from rasterstats import zonal_stats\n",
"from tqdm import tqdm\n",
"\n",
"# ── PARAMETERS ────────────────────────────────────────────────────────────────\n",
"input_folder = \"/mnt/g/temp/mcd12q1\"\n",
"zone_shapefile = \"/mnt/g/temp/boundaries/adm0/adm0.shp\"\n",
"output_folder = \"/mnt/g/temp/mcd12q1/zonal_stats\"\n",
"pixel_area_ha = 25 # each MODIS IGBP pixel = 0.25 km² = 25 ha\n",
"igbp_classes = list(range(1,18))\n",
"# ───────────────────────────────────────────────────────────────────────────────\n",
"\n",
"os.makedirs(output_folder, exist_ok=True)\n",
"gc.enable()\n",
"\n",
"# 1) load zones\n",
"ozones = gpd.read_file(zone_shapefile)\n",
"keep = ['adm0_src','adm0_name','adm0_name1','iso_3','iso_3_grp','geometry']\n",
"zones = ozones[keep].sort_values('adm0_src').reset_index(drop=True)\n",
"\n",
"# 2) list and sort rasters\n",
"def list_tifs(folder):\n",
" return sorted(\n",
" f for f in os.listdir(folder)\n",
" if f.endswith('.tif') and 'mcd12q1_igbp' in f\n",
" )\n",
"tifs = list_tifs(input_folder)\n",
"\n",
"# ── PART 1: hectares per class per year ────────────────────────────────────────\n",
"all_years = []\n",
"for tif in tqdm(tifs, desc=\"Yearly hectares\"):\n",
" year = tif.split('_')[-1].replace('.tif','')\n",
" out_csv = os.path.join(output_folder, f\"mcd12q1_ha_{year}.csv\")\n",
" if os.path.exists(out_csv):\n",
" all_years.append(pd.read_csv(out_csv))\n",
" continue\n",
"\n",
" # compute zonal counts\n",
" stats = zonal_stats(\n",
" zones,\n",
" os.path.join(input_folder, tif),\n",
" categorical=True,\n",
" all_touched=True,\n",
" nodata=0\n",
" )\n",
" df = pd.DataFrame(stats).fillna(0).astype(int)\n",
"\n",
" # ensure all classes present\n",
" for c in igbp_classes:\n",
" if c not in df.columns:\n",
" df[c] = 0\n",
" df = df[sorted(df.columns)]\n",
"\n",
" # convert pixel counts to hectares\n",
" df = df * pixel_area_ha\n",
"\n",
" # assemble per-zone output\n",
" out = zones.drop(columns='geometry').copy()\n",
" out['year'] = int(year)\n",
" out = out.reset_index().rename(columns={'index':'id'})\n",
" for c in igbp_classes:\n",
" out[f'class_{c}_ha'] = df[c]\n",
"\n",
" out.to_csv(out_csv, index=False)\n",
" all_years.append(out)\n",
"\n",
" # cleanup\n",
" del stats, df, out\n",
" gc.collect()\n",
"\n",
"# combined hectares CSV\n",
"ha_df = pd.concat(all_years, ignore_index=True)\n",
"ha_df.to_csv(\n",
" os.path.join(output_folder, \"wld_phy_mcd12q1_igbp_ha_2001_2023.csv\"),\n",
" index=False\n",
")\n",
"\n",
"del all_years, ha_df\n",
"gc.collect()\n",
"\n",
"# ── PART 2: inter-annual full transition matrix ─────────────────────────────────\n",
"for prev_tif, cur_tif in tqdm(zip(tifs[:-1], tifs[1:]), desc=\"Full transitions\", total=len(tifs)-1):\n",
" year = int(cur_tif.split('_')[-1].replace('.tif',''))\n",
" out_csv = os.path.join(output_folder, f\"mcd12q1_trans_{year}.csv\")\n",
" if os.path.exists(out_csv):\n",
" continue\n",
"\n",
" p_path = os.path.join(input_folder, prev_tif)\n",
" c_path = os.path.join(input_folder, cur_tif)\n",
" with rasterio.open(p_path) as src_p, rasterio.open(c_path) as src_c:\n",
" arr_p = src_p.read(1).astype(np.int32)\n",
" arr_c = src_c.read(1).astype(np.int32)\n",
" transform = src_p.transform\n",
" trans = arr_p * 100 + arr_c\n",
"\n",
" # zonal stats on all transition codes\n",
" stats = zonal_stats(\n",
" zones,\n",
" trans,\n",
" affine=transform,\n",
" categorical=True,\n",
" all_touched=True,\n",
" nodata=-999\n",
" )\n",
"\n",
" # build full matrix records\n",
" records = []\n",
" for idx, zone_stats in enumerate(stats):\n",
" info = zones.iloc[idx]\n",
" for src in igbp_classes:\n",
" for tgt in igbp_classes:\n",
" code = src * 100 + tgt\n",
" count = zone_stats.get(code, 0)\n",
" records.append({\n",
" 'id': idx + 1,\n",
" 'adm0_src': info.adm0_src,\n",
" 'adm0_name': info.adm0_name,\n",
" 'iso_3': info.iso_3,\n",
" 'year': year,\n",
" 'source': src,\n",
" 'target': tgt,\n",
" 'hectares': count * pixel_area_ha\n",
" })\n",
"\n",
" # write and cleanup\n",
" pd.DataFrame(records).to_csv(out_csv, index=False)\n",
" del arr_p, arr_c, trans, stats, records\n",
" gc.collect()\n",
"\n",
"# combine all full transition files\n",
"trans_files = sorted(\n",
" f for f in os.listdir(output_folder)\n",
" if f.startswith(\"mcd12q1_trans_\") and f.endswith(\".csv\")\n",
")\n",
"all_trans = pd.concat(\n",
" (pd.read_csv(os.path.join(output_folder, f)) for f in trans_files),\n",
" ignore_index=True\n",
")\n",
"all_trans.to_csv(\n",
" os.path.join(output_folder, \"wld_phy_mcd12q1_igbp_transitions_2002_2023.csv\"),\n",
" index=False\n",
")\n",
"\n",
"print(\"Done: per-year and full-matrix transitions written to\", output_folder)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment