Shapely Geometry — Deep Dive

Shapely 2.0 rewrote the library’s internals around vectorized GEOS operations exposed through NumPy arrays, replacing the slower object-by-object approach of Shapely 1.x. This guide covers the new performance model, spatial indexing, coordinate transformations, and patterns for processing millions of geometries efficiently.

Shapely 2.0 architecture

From objects to arrays

Shapely 1.x wrapped each GEOS geometry in a Python object with method-based operations. Shapely 2.0 stores geometries as lightweight wrappers and provides module-level functions that accept arrays:

import shapely
from shapely import Point, Polygon
import numpy as np

# Create 100,000 random points
coords = np.random.uniform(0, 100, (100_000, 2))
points = shapely.points(coords)  # returns an array of Point geometries

# Vectorized operation: which points are within a polygon?
zone = Polygon([(10, 10), (50, 10), (50, 50), (10, 50)])
inside = shapely.within(points, zone)  # boolean array, computed in C
print(inside.sum())  # fast — no Python loop

The vectorized path is 10–50× faster than iterating with Python for loops because the loop runs entirely in compiled C code.

Key vectorized functions

shapely.area(polygons)           # array of areas
shapely.distance(geom_a, geom_b) # pairwise distances
shapely.intersection(a, b)       # pairwise intersections
shapely.buffer(geoms, distance)  # array of buffered geometries
shapely.unary_union(geoms)       # merge all into one geometry
shapely.contains(a, b)           # pairwise containment test

Spatial indexing with STRtree

Checking every pair of geometries is O(n²). Spatial indexes reduce this to O(n log n) by pre-sorting geometries by their bounding boxes.

Building and querying an STRtree

from shapely import STRtree

tree = STRtree(points)  # build index from array of geometries

# Find all points within 5.0 units of a query point
query_point = Point(25, 25)
buffer = query_point.buffer(5.0)
indices = tree.query(buffer)  # candidate indices (bounding box filter)

# Refine with exact geometry test
exact = indices[shapely.within(points[indices], buffer)]

tree.query() returns indices of geometries whose bounding boxes intersect the query geometry. This is a fast filter — always follow with an exact predicate for precision.

Bulk nearest neighbor

# For each point in set A, find the nearest point in set B
tree_b = STRtree(points_b)
indices, distances = tree_b.query_nearest(points_a, return_distance=True)

This runs in O(n log n) and is the foundation for spatial joins in GeoPandas.

Prepared geometries

When you test the same geometry against many others, preparing it pre-computes internal indexes:

from shapely import prepare

zone = Polygon([(0, 0), (100, 0), (100, 100), (0, 100)])
prepare(zone)  # builds internal spatial index

# Subsequent operations with this geometry are faster
results = shapely.contains(zone, points)  # uses prepared index

Preparation helps when one geometry is tested repeatedly — a city boundary checked against millions of GPS traces, for example.

Coordinate transformations

Shapely 2.0 provides shapely.transform() for applying coordinate transformations:

import shapely
from pyproj import Transformer

# WGS-84 (lat/lng) to UTM zone 33N (meters)
transformer = Transformer.from_crs("EPSG:4326", "EPSG:32633", always_xy=True)

def project(coords):
    return transformer.transform(coords[:, 0], coords[:, 1])

# Transform all geometries
projected = shapely.transform(geometries, project)

# Now areas are in square meters
areas_m2 = shapely.area(projected)

Why projection matters

The area of a 1° × 1° square at the equator is ~12,300 km². The same square at 60°N is ~6,150 km². Without projection, area calculations are wrong by up to 2×.

Advanced operations

Voronoi diagrams

Partition space so each region contains all points closest to one input point:

from shapely.ops import voronoi_diagram

points_geom = shapely.MultiPoint(coords)
regions = voronoi_diagram(points_geom)

for polygon in regions.geoms:
    print(polygon.area)

Use cases: service area analysis, cell tower coverage estimation, and market territory definition.

Polygon simplification

Reduce vertex count while preserving shape:

# Douglas-Peucker simplification
simplified = shapely.simplify(complex_polygon, tolerance=0.001)

# Preserve topology (prevents self-intersections)
simplified = shapely.simplify(complex_polygon, tolerance=0.001, preserve_topology=True)

Simplification is critical before serializing to GeoJSON for web maps. A detailed coastline might have 500K vertices; simplified to 5K vertices, it renders 100× faster with minimal visual difference.

Polygon validity and repair

Real-world data often contains invalid geometries (self-intersections, duplicate points):

from shapely.validation import explain_validity, make_valid

poly = Polygon([(0, 0), (2, 2), (2, 0), (0, 2)])  # bowtie — self-intersecting
print(shapely.is_valid(poly))       # False
print(explain_validity(poly))        # "Self-intersection[1 1]"

fixed = make_valid(poly)             # splits into valid geometries
print(shapely.is_valid(fixed))       # True

Always validate geometries loaded from external sources before performing operations.

Performance patterns

Avoid Python loops

# Slow: Python loop
areas = [geom.area for geom in polygons]

# Fast: vectorized
areas = shapely.area(polygons)

The vectorized version is typically 20–50× faster for arrays of 10K+ geometries.

Batch union with unary_union

Merging geometries one by one is O(n²). Use unary_union which employs a cascaded union strategy:

# Slow: iterative
result = polygons[0]
for p in polygons[1:]:
    result = result.union(p)

# Fast: cascaded
result = shapely.unary_union(polygons)

Memory-efficient WKB serialization

Store geometries compactly using Well-Known Binary (WKB):

wkb_bytes = shapely.to_wkb(geometries)   # array of bytes objects
restored = shapely.from_wkb(wkb_bytes)    # back to geometry array

WKB is the format used by PostGIS and most spatial databases. It is more compact than WKT (text) and faster to parse.

Integration with the Python geospatial stack

LibraryRoleShapely interaction
GeoPandasDataFrames with geometry columnsBuilt on Shapely geometries
FionaRead/write shapefiles, GeoJSONReturns Shapely-compatible geometry dicts
pyprojCoordinate transformationsTransform Shapely coords between CRS
RasterioRaster data (satellite imagery)Use Shapely geometries for masking/clipping
FoliumInteractive mapsSerialize Shapely to GeoJSON for display

Common pitfalls

  1. Mixing lon/lat and lat/lon. Shapely uses (x, y) = (longitude, latitude). Many APIs return (latitude, longitude). Swapping them produces geometries in the wrong hemisphere.
  2. Measuring in degrees. Shapely does not know about the Earth’s curvature. Always project before measuring.
  3. Invalid geometries from unions. Floating-point precision can create slivers and self-intersections. Use make_valid() after complex operations.
  4. Forgetting STRtree for large datasets. A spatial join without an index on 100K geometries takes minutes; with an index, it takes seconds.

The one thing to remember: Shapely 2.0’s vectorized operations and STRtree indexing make million-geometry workflows practical in Python — but only if you avoid Python loops and always project coordinates before measuring.

pythonshapelygeometrygeospatial

See Also

  • Python Adaptive Learning Systems How Python builds learning apps that adjust to each student like a personal tutor who knows exactly what you need next.
  • Python Airflow Learn Airflow as a timetable manager that makes sure data tasks run in the right order every day.
  • Python Altair Learn Altair through the idea of drawing charts by describing rules, not by hand-placing every visual element.
  • Python Automated Grading How Python grades homework and exams automatically, from simple answer keys to understanding written essays.
  • Python Batch Vs Stream Processing Batch processing is like doing laundry once a week; stream processing is like a self-cleaning shirt that cleans itself constantly.