PyProj Coordinate Systems — Deep Dive
PyProj wraps the PROJ C library (formerly PROJ.4), which underpins nearly every open-source and many commercial GIS systems. Understanding PROJ’s transformation pipeline, datum grids, and precision model lets you handle edge cases that trip up most geospatial projects.
PROJ pipeline architecture
Every coordinate transformation is internally a pipeline of elementary operations:
from pyproj import Transformer
t = Transformer.from_crs("EPSG:27700", "EPSG:4326", always_xy=True)
print(t.to_proj4())
# +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2 ...
# +step +proj=hgridshift +grids=uk_os_OSTN15_NTv2_OSGBtoETRS.tif
# +step +proj=unitconvert +xy_in=rad +xy_out=deg
The pipeline decomposes British National Grid → WGS-84 into: inverse Transverse Mercator → horizontal grid shift (OSTN15) → radian-to-degree conversion. Each +step is an atomic operation.
Custom pipelines
You can define your own pipelines for non-standard transformations:
from pyproj import Transformer
pipeline = """
+proj=pipeline
+step +proj=unitconvert +xy_in=deg +xy_out=rad
+step +proj=cart +ellps=WGS84
+step +proj=helmert +x=-0.0016 +y=0.0014 +z=0.0058
+step +inv +proj=cart +ellps=GRS80
+step +proj=unitconvert +xy_in=rad +xy_out=deg
"""
t = Transformer.from_pipeline(pipeline)
This gives full control when PROJ’s automatic path selection is insufficient — for example, when enforcing a specific datum-shift method.
Datum grids and accuracy
PROJ supports three datum-shift methods with dramatically different accuracy:
| Method | Typical accuracy | Example |
|---|---|---|
| Helmert 7-parameter | 1-5 meters | WGS84 ↔ Tokyo datum |
| NTv2 horizontal grid | 0.01-0.10 meters | NAD27 → NAD83, OSGB → ETRS89 |
| GeoTIFF deformation grid | millimeters | ITRF2014 epoch propagation |
Installing datum grids
# Using projsync (ships with PROJ ≥ 7)
projsync --system-directory --source-id uk_os # UK Ordnance Survey grids
projsync --system-directory --source-id us_noaa # US NOAA grids
# Or download all grids (several GB)
projsync --system-directory --all
Without grids, PROJ falls back to the Helmert method, silently degrading accuracy. Check which grids are available:
from pyproj import database
print(database.get_database_metadata("PROJ_DATA.VERSION"))
Verifying transformation accuracy
from pyproj import TransformerGroup
tg = TransformerGroup("EPSG:27700", "EPSG:4326", always_xy=True)
for t in tg.transformers:
print(f"Accuracy: {t.accuracy}m — {t.name}")
# Accuracy: 0.001m — OSTN15 (OSGB-ETRS89)
# Accuracy: 2.0m — Helmert fallback
TransformerGroup lists all available transformation paths sorted by accuracy. Always check this when precision matters.
Axis order: the perennial gotcha
PROJ follows the CRS definition’s axis order by default:
- EPSG:4326 defines (latitude, longitude) — north first
- EPSG:32618 defines (easting, northing) — east first
This catches nearly everyone:
from pyproj import Transformer
# Without always_xy: input is (lat, lon) for EPSG:4326
t1 = Transformer.from_crs("EPSG:4326", "EPSG:32618")
x, y = t1.transform(40.748, -73.985) # lat first!
# With always_xy: input is always (x, y) = (lon, lat)
t2 = Transformer.from_crs("EPSG:4326", "EPSG:32618", always_xy=True)
x, y = t2.transform(-73.985, 40.748) # lon first!
Best practice: always use always_xy=True and document it in your codebase.
Epoch-aware (dynamic) transformations
Modern reference frames like ITRF2014 change over time due to tectonic motion. PROJ supports epoch-aware transforms:
from pyproj import Transformer
t = Transformer.from_crs(
"EPSG:9988", # ITRF2014 (dynamic)
"EPSG:7844", # GDA2020 (Australia, epoch 2020.0)
always_xy=True,
)
# Transform at a specific epoch
x, y, z = t.transform(151.209, -33.868, 0, tt=2024.5)
The tt parameter specifies the observation epoch. Without it, dynamic CRS conversions lose centimeters per year of tectonic drift.
CRS inspection and construction
Inspecting CRS details
from pyproj import CRS
crs = CRS.from_epsg(32618)
print(crs.to_wkt(pretty=True)) # Well-Known Text
print(crs.to_proj4()) # PROJ string (lossy, avoid for storage)
print(crs.ellipsoid) # Ellipsoid info
print(crs.datum) # Datum info
print(crs.area_of_use) # Geographic bounds where CRS is valid
print(crs.coordinate_operation) # Projection parameters
Building custom CRS
from pyproj import CRS
custom = CRS.from_proj4(
"+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 "
"+x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs"
)
For production, prefer WKT2 over PROJ strings. PROJ strings lose information (especially datum details):
# Round-trip: CRS → WKT → CRS preserves all metadata
wkt = crs.to_wkt()
restored = CRS.from_wkt(wkt)
assert crs == restored # True
Performance optimization
Reuse transformers
Creating a Transformer is expensive (parses CRS, builds pipeline). Create once, use many times:
# Bad: creates a new transformer per point
for lon, lat in points:
t = Transformer.from_crs(4326, 32618, always_xy=True)
x, y = t.transform(lon, lat)
# Good: create once
t = Transformer.from_crs(4326, 32618, always_xy=True)
for lon, lat in points:
x, y = t.transform(lon, lat)
Vectorized transforms with NumPy
import numpy as np
from pyproj import Transformer
t = Transformer.from_crs(4326, 32618, always_xy=True)
lons = np.random.uniform(-74, -73, 1_000_000)
lats = np.random.uniform(40, 41, 1_000_000)
xs, ys = t.transform(lons, lats) # ~0.3s for 1M points
The C-level loop handles the array without Python overhead. This is 100-500× faster than a Python loop.
Thread safety
Transformer objects are thread-safe for read-only use. You can share a single instance across threads:
from concurrent.futures import ThreadPoolExecutor
t = Transformer.from_crs(4326, 32618, always_xy=True)
def reproject_chunk(chunk):
return t.transform(chunk[:, 0], chunk[:, 1])
with ThreadPoolExecutor(max_workers=8) as pool:
results = list(pool.map(reproject_chunk, chunks))
Common pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
| Wrong axis order | Points reflected across equator/prime meridian | Use always_xy=True |
| Missing datum grids | 2-5m offset between datasets | Install grids with projsync |
| Using PROJ strings for storage | Datum info lost on round-trip | Store as WKT2 or EPSG codes |
| Computing area in geographic CRS | Result in “square degrees” | Project to local UTM first |
| Ignoring CRS area of use | Extreme distortion at edges | Check crs.area_of_use bounds |
| Mixing 2D and 3D CRS | Unexpected z-coordinate injection | Match dimensionality explicitly |
Integration patterns
With GeoPandas
import geopandas as gpd
gdf = gpd.read_file("data.shp")
gdf_utm = gdf.to_crs(epsg=32618) # Uses PyProj internally
With Rasterio
from rasterio.warp import calculate_default_transform, reproject
# Rasterio delegates CRS handling to PyProj
With Shapely (manual transform)
from shapely.ops import transform
from pyproj import Transformer
t = Transformer.from_crs(4326, 32618, always_xy=True).transform
projected_geom = transform(t, original_geom)
The one thing to remember: PyProj’s accuracy depends entirely on the transformation path it selects — install datum grids, verify accuracy with TransformerGroup, use always_xy=True, and store CRS as WKT2 to avoid silent precision loss.
See Also
- Python Adaptive Learning Systems How Python builds learning apps that adjust to each student like a personal tutor who knows exactly what you need next.
- Python Airflow Learn Airflow as a timetable manager that makes sure data tasks run in the right order every day.
- Python Altair Learn Altair through the idea of drawing charts by describing rules, not by hand-placing every visual element.
- Python Automated Grading How Python grades homework and exams automatically, from simple answer keys to understanding written essays.
- Python Batch Vs Stream Processing Batch processing is like doing laundry once a week; stream processing is like a self-cleaning shirt that cleans itself constantly.