Python Deforestation Detection — Deep Dive

Build satellite-based deforestation monitoring pipelines in Python — from STAC data access and cloud-free compositing to time-series breakpoint detection and SAR change analysis.

System architecture

A production deforestation monitoring system ingests satellite imagery, generates analysis-ready composites, detects changes, and delivers alerts:

STAC Catalog (Sentinel-2, Sentinel-1, Landsat)
    → Cloud masking + Atmospheric correction
        → Time-series construction (per-pixel NDVI/backscatter)
            → Change detection (BFAST / ML classifier)
                → Alert polygons → Dashboard + Notifications

Accessing satellite data via STAC

The SpatioTemporal Asset Catalog (STAC) has become the standard way to discover and access satellite imagery. Python libraries make it seamless:

import pystac_client
import stackstac
import xarray as xr
import numpy as np

def load_sentinel2_stack(
    bbox: tuple[float, float, float, float],
    date_range: str,
    max_cloud_pct: int = 30
) -> xr.DataArray:
    """Load a Sentinel-2 image stack from a STAC catalog."""
    catalog = pystac_client.Client.open(
        "https://earth-search.aws.element84.com/v1"
    )
    
    search = catalog.search(
        collections=["sentinel-2-l2a"],
        bbox=bbox,
        datetime=date_range,
        query={"eo:cloud_cover": {"lt": max_cloud_pct}}
    )
    
    items = list(search.items())
    print(f"Found {len(items)} scenes")
    
    # Load as lazy xarray DataArray
    stack = stackstac.stack(
        items,
        bounds_latlon=bbox,
        resolution=10,
        assets=["red", "green", "blue", "nir", "scl"],
        dtype=np.float32
    )
    
    return stack

def apply_cloud_mask(stack: xr.DataArray) -> xr.DataArray:
    """Mask clouds using Sentinel-2 Scene Classification Layer."""
    scl = stack.sel(band="scl")
    
    # SCL classes: 3=cloud shadow, 8=cloud medium, 9=cloud high, 10=thin cirrus
    cloud_mask = scl.isin([3, 8, 9, 10])
    
    # Also mask snow/ice (11) and water (6) if focusing on forest
    non_vegetation = scl.isin([3, 6, 8, 9, 10, 11])
    
    masked = stack.where(~non_vegetation)
    return masked

Building cloud-free composites

Individual scenes have cloud gaps. Temporal compositing fills them:

def create_monthly_composites(
    stack: xr.DataArray,
    method: str = "median"
) -> xr.DataArray:
    """Create monthly cloud-free composites from masked image stack."""
    # Compute NDVI
    nir = stack.sel(band="nir")
    red = stack.sel(band="red")
    ndvi = (nir - red) / (nir + red + 1e-10)
    
    # Group by month and compute composite
    monthly = ndvi.resample(time="1ME")
    
    if method == "median":
        composite = monthly.median(dim="time")
    elif method == "max":
        # Maximum NDVI composite — best for cloud-heavy periods
        composite = monthly.max(dim="time")
    
    return composite

def create_quarterly_baseline(
    stack: xr.DataArray,
    year: int,
    quarter: int
) -> xr.DataArray:
    """Create a quarterly reference composite for change detection."""
    quarter_months = {1: (1, 3), 2: (4, 6), 3: (7, 9), 4: (10, 12)}
    start_month, end_month = quarter_months[quarter]
    
    start = f"{year}-{start_month:02d}-01"
    end = f"{year}-{end_month:02d}-28"
    
    quarterly = stack.sel(time=slice(start, end))
    
    nir = quarterly.sel(band="nir")
    red = quarterly.sel(band="red")
    ndvi = (nir - red) / (nir + red + 1e-10)
    
    return ndvi.median(dim="time")

BFAST-style breakpoint detection

Breaks For Additive Season and Trend (BFAST) decomposes pixel time series into trend, season, and remainder, then detects structural breaks:

import numpy as np
from scipy.signal import find_peaks

def detect_forest_disturbance(
    ndvi_series: np.ndarray,
    dates: np.ndarray,
    history_fraction: float = 0.7,
    threshold_std: float = 3.0
) -> dict | None:
    """Simplified BFAST-monitor approach for disturbance detection.
    
    Uses historical period to model expected behavior,
    then flags deviations in the monitoring period.
    """
    n = len(ndvi_series)
    history_end = int(n * history_fraction)
    
    # Remove NaN values from history
    hist_valid = ~np.isnan(ndvi_series[:history_end])
    hist_ndvi = ndvi_series[:history_end][hist_valid]
    hist_dates = dates[:history_end][hist_valid]
    
    if len(hist_ndvi) < 20:
        return None
    
    # Fit harmonic model to history (captures seasonal cycle)
    # y = a0 + a1*cos(2π*t/365) + a2*sin(2π*t/365) + a3*t
    days_from_start = (hist_dates - hist_dates[0]).astype("timedelta64[D]").astype(float)
    
    X_hist = np.column_stack([
        np.ones(len(hist_ndvi)),
        np.cos(2 * np.pi * days_from_start / 365.25),
        np.sin(2 * np.pi * days_from_start / 365.25),
        days_from_start  # linear trend
    ])
    
    coeffs, _, _, _ = np.linalg.lstsq(X_hist, hist_ndvi, rcond=None)
    
    residuals_hist = hist_ndvi - X_hist @ coeffs
    hist_std = residuals_hist.std()
    
    # Apply model to monitoring period
    mon_ndvi = ndvi_series[history_end:]
    mon_dates = dates[history_end:]
    mon_valid = ~np.isnan(mon_ndvi)
    
    if mon_valid.sum() < 3:
        return None
    
    mon_days = (mon_dates[mon_valid] - hist_dates[0]).astype("timedelta64[D]").astype(float)
    
    X_mon = np.column_stack([
        np.ones(mon_valid.sum()),
        np.cos(2 * np.pi * mon_days / 365.25),
        np.sin(2 * np.pi * mon_days / 365.25),
        mon_days
    ])
    
    expected = X_mon @ coeffs
    residuals_mon = mon_ndvi[mon_valid] - expected
    
    # Detect negative breaks (forest loss = drop in NDVI)
    significant_drops = residuals_mon < (-threshold_std * hist_std)
    
    if significant_drops.any():
        first_break_idx = np.argmax(significant_drops)
        return {
            "break_date": str(mon_dates[mon_valid][first_break_idx]),
            "magnitude": float(residuals_mon[first_break_idx]),
            "threshold": float(-threshold_std * hist_std),
            "confidence": float(
                abs(residuals_mon[first_break_idx]) / hist_std
            )
        }
    
    return None

SAR-based detection for cloud-prone areas

Sentinel-1 SAR operates through clouds, providing reliable monitoring in the tropics:

import rasterio
import numpy as np
from scipy.ndimage import uniform_filter

def detect_sar_change(
    before_vv_path: str,
    after_vv_path: str,
    forest_mask_path: str,
    threshold_db: float = -3.0,
    min_patch_pixels: int = 25
) -> tuple[np.ndarray, dict]:
    """Detect forest loss from Sentinel-1 VV backscatter change."""
    
    with rasterio.open(before_vv_path) as src:
        before_vv = src.read(1).astype(np.float32)
        profile = src.profile.copy()
    
    with rasterio.open(after_vv_path) as src:
        after_vv = src.read(1).astype(np.float32)
    
    with rasterio.open(forest_mask_path) as src:
        forest = src.read(1).astype(bool)
    
    # Convert to dB if in linear scale
    before_db = 10 * np.log10(np.clip(before_vv, 1e-10, None))
    after_db = 10 * np.log10(np.clip(after_vv, 1e-10, None))
    
    # Speckle filtering (Lee-style)
    before_db = uniform_filter(before_db, size=5)
    after_db = uniform_filter(after_db, size=5)
    
    # Change ratio
    change_db = after_db - before_db
    
    # Forest loss = significant decrease in backscatter within forest areas
    loss_mask = (change_db < threshold_db) & forest
    
    # Remove small patches (noise)
    from scipy.ndimage import label
    labeled, n_features = label(loss_mask)
    for i in range(1, n_features + 1):
        if (labeled == i).sum() < min_patch_pixels:
            loss_mask[labeled == i] = False
    
    # Compute statistics
    pixel_area_ha = abs(profile["transform"].a * profile["transform"].e) / 10000
    stats = {
        "loss_pixels": int(loss_mask.sum()),
        "loss_area_ha": float(loss_mask.sum() * pixel_area_ha),
        "mean_change_db": float(change_db[loss_mask].mean()) if loss_mask.any() else 0,
    }
    
    return loss_mask, stats

Combining optical and SAR for robust monitoring

The most reliable systems fuse both data sources:

def fused_deforestation_alert(
    optical_alert: np.ndarray | None,
    sar_alert: np.ndarray | None,
    optical_date: str | None,
    sar_date: str | None
) -> dict:
    """Combine optical and SAR alerts into confidence-scored output."""
    
    if optical_alert is not None and sar_alert is not None:
        # Both sources agree = high confidence
        both = optical_alert & sar_alert
        optical_only = optical_alert & ~sar_alert
        sar_only = sar_alert & ~optical_alert
        
        return {
            "high_confidence": both,
            "medium_confidence_optical": optical_only,
            "medium_confidence_sar": sar_only,
            "total_high_confidence_pixels": int(both.sum()),
            "sources": ["optical", "sar"],
            "dates": {"optical": optical_date, "sar": sar_date}
        }
    elif optical_alert is not None:
        return {
            "medium_confidence_optical": optical_alert,
            "sources": ["optical"],
            "dates": {"optical": optical_date}
        }
    else:
        return {
            "medium_confidence_sar": sar_alert,
            "sources": ["sar"],
            "dates": {"sar": sar_date}
        }

Tradeoffs

Resolution vs. coverage: Planet (3m daily) catches small clearings but covers limited areas and costs money. Sentinel-2 (10m, 5-day) is free and global but misses small-scale selective logging. Landsat (30m, 16-day) provides the longest historical record but at coarse resolution.

Speed vs. accuracy: MODIS-based alerts (VIIRS active fires) arrive in hours but at 375m resolution with many false positives. Sentinel-based alerts take days to weeks but are more precise. The GLAD system compromises at weekly Landsat-based alerts.

Optical vs. SAR: Optical captures spectral change (green → brown) directly. SAR captures structural change (trees → no trees) and works through clouds. SAR produces more false positives from moisture changes (wet ground looks different from dry ground), requiring careful calibration.

Baseline definition: “What counts as forest?” is surprisingly political. A rubber plantation in Indonesia might be classified as forest by one definition and agricultural land by another. The choice of baseline forest map fundamentally determines what gets flagged as loss.

One thing to remember: Production deforestation monitoring requires fusing optical and SAR satellite data, handling persistent cloud cover through temporal compositing, and choosing the right tradeoff between detection speed, spatial resolution, and false-alarm rate — Python’s geospatial stack handles each component, but the system design decisions determine whether alerts are actionable or just noise.

pythonenvironmental-scienceremote-sensingconservationgeospatial