Fiona Vector Data — Deep Dive
Fiona makes vector I/O simple, but production workloads demand understanding of its GDAL underpinnings, driver-specific behaviors, cloud-native access patterns, and performance optimizations. This deep dive covers the internals that separate notebook experiments from reliable geospatial pipelines.
Architecture: Fiona ↔ GDAL/OGR
Fiona is a Cython bridge to libgdal’s OGR layer. The call chain looks like:
fiona.open("file.gpkg")
→ OGR_DS_Open() # open datasource
→ OGR_DS_GetLayer() # get the layer
→ OGR_L_GetFeature() # iterate features
→ Python dict conversion # geometry + properties
Each feature crosses the C/Python boundary exactly once. Fiona converts OGR field types to Python types (int, float, str, datetime, bytes) and geometry to GeoJSON-style dicts. This boundary crossing is the primary performance bottleneck.
Virtual filesystems for cloud data
GDAL’s virtual filesystem handlers let Fiona read from S3, GCS, Azure Blob, HTTP, and ZIP archives without downloading the entire file first.
# Read from S3
with fiona.open("zip+s3://bucket/shapes.zip") as src:
features = list(src)
# Read from HTTP
with fiona.open("/vsicurl/https://example.com/data.geojson") as src:
for feature in src:
process(feature)
# Read a specific layer inside a ZIP
with fiona.open("zip://archive.zip!layer.shp") as src:
pass
Configure credentials through environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or GDAL config options.
Driver-specific behaviors
Not all drivers behave identically. Key differences to know:
| Driver | Transactions | Spatial index | Max field name | Multi-layer |
|---|---|---|---|---|
| ESRI Shapefile | No | .shx (sequential) | 10 chars | No |
| GeoPackage | Yes | R-tree built-in | 128 chars | Yes |
| GeoJSON | No | No | Unlimited | No |
| FlatGeobuf | No | Hilbert R-tree | Unlimited | No |
| FileGDB | Yes | Built-in | 64 chars | Yes |
GeoPackage transactions
with fiona.open("output.gpkg", "w", driver="GPKG", schema=schema, crs=crs) as dst:
# GeoPackage wraps writes in SQLite transactions automatically
# For explicit control, use GDAL config:
# GDAL_GPKG_COMMIT_INTERVAL=1000
for batch in chunked(features, 1000):
for feature in batch:
dst.write(feature)
Shapefile limitations
Shapefiles truncate field names to 10 characters, cannot store None in numeric fields (uses a sentinel value), and split data across .shp, .shx, .dbf, and .prj files. Prefer GeoPackage for new projects.
Schema evolution and field types
Fiona’s schema maps to OGR field types:
# Extended field types
schema = {
"geometry": "MultiPolygon",
"properties": OrderedDict([
("id", "int"),
("name", "str:100"), # max length 100
("area", "float"),
("created", "datetime"),
("data", "bytes"), # binary field
]),
}
Adding fields to an existing file requires rewriting it — most vector formats do not support ALTER TABLE. The pattern is:
with fiona.open("input.gpkg") as src:
new_schema = src.schema.copy()
new_schema["properties"]["new_field"] = "float"
with fiona.open("output.gpkg", "w", driver=src.driver,
crs=src.crs, schema=new_schema) as dst:
for feature in src:
feature["properties"]["new_field"] = compute_value(feature)
dst.write(feature)
High-throughput reading patterns
Bulk feature access
For maximum read speed, minimize Python-level processing per feature:
with fiona.open("large.gpkg") as src:
# Read all features into memory at once
features = list(src) # fastest for files that fit in RAM
Spatial filtering pushdown
When the format has a spatial index (GeoPackage, FlatGeobuf), bbox filtering happens in OGR’s C layer before features reach Python:
with fiona.open("buildings.gpkg") as src:
# OGR uses the R-tree index → only matching features cross to Python
subset = list(src.filter(bbox=(-73.99, 40.70, -73.95, 40.75)))
Without a spatial index, filter(bbox=...) still works but scans every feature.
Attribute filtering
Use OGR SQL expressions to filter by properties:
with fiona.open("parcels.gpkg") as src:
for feature in src.filter(where="area_sqm > 5000 AND zone = 'residential'"):
process(feature)
This pushes the filter into OGR, avoiding deserialization of non-matching features.
Multi-layer files
GeoPackage and FileGDB support multiple layers in a single file:
# List layers
print(fiona.listlayers("city_data.gpkg"))
# ['buildings', 'roads', 'parks']
# Open a specific layer
with fiona.open("city_data.gpkg", layer="roads") as src:
road_features = list(src)
Memory-based I/O with BytesCollection
Process features entirely in memory without touching the filesystem:
from fiona.io import MemoryFile
# Write to memory buffer
with MemoryFile() as memfile:
with memfile.open(driver="GeoJSON", schema=schema, crs=crs) as dst:
for f in features:
dst.write(f)
# Get the bytes
geojson_bytes = memfile.getbuffer()
This is useful when generating vector data for API responses or message queues.
Error handling patterns
from fiona.errors import DriverError, SchemaError
try:
with fiona.open("maybe_corrupt.shp") as src:
features = list(src)
except DriverError as e:
# File not found, unsupported format, corrupt header
logger.error(f"Cannot open file: {e}")
except SchemaError as e:
# Schema mismatch during write
logger.error(f"Schema problem: {e}")
Handling encoding issues
Shapefiles use .cpg to declare encoding. If it is missing, Fiona defaults to ISO-8859-1, which mangles non-Latin characters. Force UTF-8 with:
with fiona.open("japanese_cities.shp", encoding="utf-8") as src:
for feature in src:
print(feature["properties"]["name"])
Integration with the ecosystem
| Task | Fiona role | Partner library |
|---|---|---|
| Read shapefile into GeoDataFrame | I/O backend | GeoPandas (gpd.read_file calls Fiona) |
| Convert geometry dicts to objects | Provides dicts | Shapely (shape(feature["geometry"])) |
| Reproject coordinates | Reads source CRS | pyproj (Transformer) |
| Stream features to PostGIS | Reads source | psycopg2 / SQLAlchemy + GeoAlchemy2 |
Performance comparison: format choice matters
Benchmark reading 1M polygon features (typical):
| Format | Read time | File size | Spatial filter speedup |
|---|---|---|---|
| Shapefile | 12s | 850 MB | None (sequential scan) |
| GeoPackage | 8s | 620 MB | 10-50× with R-tree |
| FlatGeobuf | 5s | 580 MB | 20-100× with Hilbert index |
| GeoJSON | 25s | 1.4 GB | None |
FlatGeobuf is the fastest for read-heavy, single-layer workloads. GeoPackage wins when you need transactions, multiple layers, or wide tool compatibility.
The one thing to remember: Fiona’s value is abstraction — it shields you from format complexity so you can treat any vector file as a stream of Python dicts, but production performance depends on choosing the right driver and pushing filters into the OGR layer.
See Also
- Python Adaptive Learning Systems How Python builds learning apps that adjust to each student like a personal tutor who knows exactly what you need next.
- Python Airflow Learn Airflow as a timetable manager that makes sure data tasks run in the right order every day.
- Python Altair Learn Altair through the idea of drawing charts by describing rules, not by hand-placing every visual element.
- Python Automated Grading How Python grades homework and exams automatically, from simple answer keys to understanding written essays.
- Python Batch Vs Stream Processing Batch processing is like doing laundry once a week; stream processing is like a self-cleaning shirt that cleans itself constantly.