Python Deforestation Detection — Deep Dive
System architecture
A production deforestation monitoring system ingests satellite imagery, generates analysis-ready composites, detects changes, and delivers alerts:
STAC Catalog (Sentinel-2, Sentinel-1, Landsat)
→ Cloud masking + Atmospheric correction
→ Time-series construction (per-pixel NDVI/backscatter)
→ Change detection (BFAST / ML classifier)
→ Alert polygons → Dashboard + Notifications
Accessing satellite data via STAC
The SpatioTemporal Asset Catalog (STAC) has become the standard way to discover and access satellite imagery. Python libraries make it seamless:
import pystac_client
import stackstac
import xarray as xr
import numpy as np
def load_sentinel2_stack(
bbox: tuple[float, float, float, float],
date_range: str,
max_cloud_pct: int = 30
) -> xr.DataArray:
"""Load a Sentinel-2 image stack from a STAC catalog."""
catalog = pystac_client.Client.open(
"https://earth-search.aws.element84.com/v1"
)
search = catalog.search(
collections=["sentinel-2-l2a"],
bbox=bbox,
datetime=date_range,
query={"eo:cloud_cover": {"lt": max_cloud_pct}}
)
items = list(search.items())
print(f"Found {len(items)} scenes")
# Load as lazy xarray DataArray
stack = stackstac.stack(
items,
bounds_latlon=bbox,
resolution=10,
assets=["red", "green", "blue", "nir", "scl"],
dtype=np.float32
)
return stack
def apply_cloud_mask(stack: xr.DataArray) -> xr.DataArray:
"""Mask clouds using Sentinel-2 Scene Classification Layer."""
scl = stack.sel(band="scl")
# SCL classes: 3=cloud shadow, 8=cloud medium, 9=cloud high, 10=thin cirrus
cloud_mask = scl.isin([3, 8, 9, 10])
# Also mask snow/ice (11) and water (6) if focusing on forest
non_vegetation = scl.isin([3, 6, 8, 9, 10, 11])
masked = stack.where(~non_vegetation)
return masked
Building cloud-free composites
Individual scenes have cloud gaps. Temporal compositing fills them:
def create_monthly_composites(
stack: xr.DataArray,
method: str = "median"
) -> xr.DataArray:
"""Create monthly cloud-free composites from masked image stack."""
# Compute NDVI
nir = stack.sel(band="nir")
red = stack.sel(band="red")
ndvi = (nir - red) / (nir + red + 1e-10)
# Group by month and compute composite
monthly = ndvi.resample(time="1ME")
if method == "median":
composite = monthly.median(dim="time")
elif method == "max":
# Maximum NDVI composite — best for cloud-heavy periods
composite = monthly.max(dim="time")
return composite
def create_quarterly_baseline(
stack: xr.DataArray,
year: int,
quarter: int
) -> xr.DataArray:
"""Create a quarterly reference composite for change detection."""
quarter_months = {1: (1, 3), 2: (4, 6), 3: (7, 9), 4: (10, 12)}
start_month, end_month = quarter_months[quarter]
start = f"{year}-{start_month:02d}-01"
end = f"{year}-{end_month:02d}-28"
quarterly = stack.sel(time=slice(start, end))
nir = quarterly.sel(band="nir")
red = quarterly.sel(band="red")
ndvi = (nir - red) / (nir + red + 1e-10)
return ndvi.median(dim="time")
BFAST-style breakpoint detection
Breaks For Additive Season and Trend (BFAST) decomposes pixel time series into trend, season, and remainder, then detects structural breaks:
import numpy as np
from scipy.signal import find_peaks
def detect_forest_disturbance(
ndvi_series: np.ndarray,
dates: np.ndarray,
history_fraction: float = 0.7,
threshold_std: float = 3.0
) -> dict | None:
"""Simplified BFAST-monitor approach for disturbance detection.
Uses historical period to model expected behavior,
then flags deviations in the monitoring period.
"""
n = len(ndvi_series)
history_end = int(n * history_fraction)
# Remove NaN values from history
hist_valid = ~np.isnan(ndvi_series[:history_end])
hist_ndvi = ndvi_series[:history_end][hist_valid]
hist_dates = dates[:history_end][hist_valid]
if len(hist_ndvi) < 20:
return None
# Fit harmonic model to history (captures seasonal cycle)
# y = a0 + a1*cos(2π*t/365) + a2*sin(2π*t/365) + a3*t
days_from_start = (hist_dates - hist_dates[0]).astype("timedelta64[D]").astype(float)
X_hist = np.column_stack([
np.ones(len(hist_ndvi)),
np.cos(2 * np.pi * days_from_start / 365.25),
np.sin(2 * np.pi * days_from_start / 365.25),
days_from_start # linear trend
])
coeffs, _, _, _ = np.linalg.lstsq(X_hist, hist_ndvi, rcond=None)
residuals_hist = hist_ndvi - X_hist @ coeffs
hist_std = residuals_hist.std()
# Apply model to monitoring period
mon_ndvi = ndvi_series[history_end:]
mon_dates = dates[history_end:]
mon_valid = ~np.isnan(mon_ndvi)
if mon_valid.sum() < 3:
return None
mon_days = (mon_dates[mon_valid] - hist_dates[0]).astype("timedelta64[D]").astype(float)
X_mon = np.column_stack([
np.ones(mon_valid.sum()),
np.cos(2 * np.pi * mon_days / 365.25),
np.sin(2 * np.pi * mon_days / 365.25),
mon_days
])
expected = X_mon @ coeffs
residuals_mon = mon_ndvi[mon_valid] - expected
# Detect negative breaks (forest loss = drop in NDVI)
significant_drops = residuals_mon < (-threshold_std * hist_std)
if significant_drops.any():
first_break_idx = np.argmax(significant_drops)
return {
"break_date": str(mon_dates[mon_valid][first_break_idx]),
"magnitude": float(residuals_mon[first_break_idx]),
"threshold": float(-threshold_std * hist_std),
"confidence": float(
abs(residuals_mon[first_break_idx]) / hist_std
)
}
return None
SAR-based detection for cloud-prone areas
Sentinel-1 SAR operates through clouds, providing reliable monitoring in the tropics:
import rasterio
import numpy as np
from scipy.ndimage import uniform_filter
def detect_sar_change(
before_vv_path: str,
after_vv_path: str,
forest_mask_path: str,
threshold_db: float = -3.0,
min_patch_pixels: int = 25
) -> tuple[np.ndarray, dict]:
"""Detect forest loss from Sentinel-1 VV backscatter change."""
with rasterio.open(before_vv_path) as src:
before_vv = src.read(1).astype(np.float32)
profile = src.profile.copy()
with rasterio.open(after_vv_path) as src:
after_vv = src.read(1).astype(np.float32)
with rasterio.open(forest_mask_path) as src:
forest = src.read(1).astype(bool)
# Convert to dB if in linear scale
before_db = 10 * np.log10(np.clip(before_vv, 1e-10, None))
after_db = 10 * np.log10(np.clip(after_vv, 1e-10, None))
# Speckle filtering (Lee-style)
before_db = uniform_filter(before_db, size=5)
after_db = uniform_filter(after_db, size=5)
# Change ratio
change_db = after_db - before_db
# Forest loss = significant decrease in backscatter within forest areas
loss_mask = (change_db < threshold_db) & forest
# Remove small patches (noise)
from scipy.ndimage import label
labeled, n_features = label(loss_mask)
for i in range(1, n_features + 1):
if (labeled == i).sum() < min_patch_pixels:
loss_mask[labeled == i] = False
# Compute statistics
pixel_area_ha = abs(profile["transform"].a * profile["transform"].e) / 10000
stats = {
"loss_pixels": int(loss_mask.sum()),
"loss_area_ha": float(loss_mask.sum() * pixel_area_ha),
"mean_change_db": float(change_db[loss_mask].mean()) if loss_mask.any() else 0,
}
return loss_mask, stats
Combining optical and SAR for robust monitoring
The most reliable systems fuse both data sources:
def fused_deforestation_alert(
optical_alert: np.ndarray | None,
sar_alert: np.ndarray | None,
optical_date: str | None,
sar_date: str | None
) -> dict:
"""Combine optical and SAR alerts into confidence-scored output."""
if optical_alert is not None and sar_alert is not None:
# Both sources agree = high confidence
both = optical_alert & sar_alert
optical_only = optical_alert & ~sar_alert
sar_only = sar_alert & ~optical_alert
return {
"high_confidence": both,
"medium_confidence_optical": optical_only,
"medium_confidence_sar": sar_only,
"total_high_confidence_pixels": int(both.sum()),
"sources": ["optical", "sar"],
"dates": {"optical": optical_date, "sar": sar_date}
}
elif optical_alert is not None:
return {
"medium_confidence_optical": optical_alert,
"sources": ["optical"],
"dates": {"optical": optical_date}
}
else:
return {
"medium_confidence_sar": sar_alert,
"sources": ["sar"],
"dates": {"sar": sar_date}
}
Tradeoffs
Resolution vs. coverage: Planet (3m daily) catches small clearings but covers limited areas and costs money. Sentinel-2 (10m, 5-day) is free and global but misses small-scale selective logging. Landsat (30m, 16-day) provides the longest historical record but at coarse resolution.
Speed vs. accuracy: MODIS-based alerts (VIIRS active fires) arrive in hours but at 375m resolution with many false positives. Sentinel-based alerts take days to weeks but are more precise. The GLAD system compromises at weekly Landsat-based alerts.
Optical vs. SAR: Optical captures spectral change (green → brown) directly. SAR captures structural change (trees → no trees) and works through clouds. SAR produces more false positives from moisture changes (wet ground looks different from dry ground), requiring careful calibration.
Baseline definition: “What counts as forest?” is surprisingly political. A rubber plantation in Indonesia might be classified as forest by one definition and agricultural land by another. The choice of baseline forest map fundamentally determines what gets flagged as loss.
One thing to remember: Production deforestation monitoring requires fusing optical and SAR satellite data, handling persistent cloud cover through temporal compositing, and choosing the right tradeoff between detection speed, spatial resolution, and false-alarm rate — Python’s geospatial stack handles each component, but the system design decisions determine whether alerts are actionable or just noise.
See Also
- Python Biodiversity Tracking How Python helps scientists count and protect every kind of animal and plant on Earth — from whales to wildflowers.
- Python Crop Disease Detection How Python looks at photos of plants and figures out if they're sick — like a doctor for crops.
- Python Drone Image Processing How Python turns hundreds of overlapping drone photos into detailed maps and 3D models of the ground below.
- Python Ocean Data Analysis How Python explores the world's oceans through data — tracking currents, temperatures, and marine life without getting wet.
- Python Precision Agriculture How Python helps farmers give every plant exactly what it needs instead of treating the whole field the same way.