Fiona Vector Data — Deep Dive

Master Fiona internals: GDAL driver tuning, virtual filesystems for cloud data, schema evolution, transactions, and high-throughput vector I/O patterns.

Fiona makes vector I/O simple, but production workloads demand understanding of its GDAL underpinnings, driver-specific behaviors, cloud-native access patterns, and performance optimizations. This deep dive covers the internals that separate notebook experiments from reliable geospatial pipelines.

Architecture: Fiona ↔ GDAL/OGR

Fiona is a Cython bridge to libgdal’s OGR layer. The call chain looks like:

fiona.open("file.gpkg")
  → OGR_DS_Open()          # open datasource
  → OGR_DS_GetLayer()      # get the layer
  → OGR_L_GetFeature()     # iterate features
  → Python dict conversion  # geometry + properties

Each feature crosses the C/Python boundary exactly once. Fiona converts OGR field types to Python types (int, float, str, datetime, bytes) and geometry to GeoJSON-style dicts. This boundary crossing is the primary performance bottleneck.

Virtual filesystems for cloud data

GDAL’s virtual filesystem handlers let Fiona read from S3, GCS, Azure Blob, HTTP, and ZIP archives without downloading the entire file first.

# Read from S3
with fiona.open("zip+s3://bucket/shapes.zip") as src:
    features = list(src)

# Read from HTTP
with fiona.open("/vsicurl/https://example.com/data.geojson") as src:
    for feature in src:
        process(feature)

# Read a specific layer inside a ZIP
with fiona.open("zip://archive.zip!layer.shp") as src:
    pass

Configure credentials through environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or GDAL config options.

Driver-specific behaviors

Not all drivers behave identically. Key differences to know:

Driver	Transactions	Spatial index	Max field name	Multi-layer
ESRI Shapefile	No	.shx (sequential)	10 chars	No
GeoPackage	Yes	R-tree built-in	128 chars	Yes
GeoJSON	No	No	Unlimited	No
FlatGeobuf	No	Hilbert R-tree	Unlimited	No
FileGDB	Yes	Built-in	64 chars	Yes

GeoPackage transactions

with fiona.open("output.gpkg", "w", driver="GPKG", schema=schema, crs=crs) as dst:
    # GeoPackage wraps writes in SQLite transactions automatically
    # For explicit control, use GDAL config:
    # GDAL_GPKG_COMMIT_INTERVAL=1000
    for batch in chunked(features, 1000):
        for feature in batch:
            dst.write(feature)

Shapefile limitations

Shapefiles truncate field names to 10 characters, cannot store None in numeric fields (uses a sentinel value), and split data across .shp, .shx, .dbf, and .prj files. Prefer GeoPackage for new projects.

Schema evolution and field types

Fiona’s schema maps to OGR field types:

# Extended field types
schema = {
    "geometry": "MultiPolygon",
    "properties": OrderedDict([
        ("id", "int"),
        ("name", "str:100"),         # max length 100
        ("area", "float"),
        ("created", "datetime"),
        ("data", "bytes"),           # binary field
    ]),
}

Adding fields to an existing file requires rewriting it — most vector formats do not support ALTER TABLE. The pattern is:

with fiona.open("input.gpkg") as src:
    new_schema = src.schema.copy()
    new_schema["properties"]["new_field"] = "float"

    with fiona.open("output.gpkg", "w", driver=src.driver,
                    crs=src.crs, schema=new_schema) as dst:
        for feature in src:
            feature["properties"]["new_field"] = compute_value(feature)
            dst.write(feature)

High-throughput reading patterns

Bulk feature access

For maximum read speed, minimize Python-level processing per feature:

with fiona.open("large.gpkg") as src:
    # Read all features into memory at once
    features = list(src)  # fastest for files that fit in RAM

Spatial filtering pushdown

When the format has a spatial index (GeoPackage, FlatGeobuf), bbox filtering happens in OGR’s C layer before features reach Python:

with fiona.open("buildings.gpkg") as src:
    # OGR uses the R-tree index → only matching features cross to Python
    subset = list(src.filter(bbox=(-73.99, 40.70, -73.95, 40.75)))

Without a spatial index, filter(bbox=...) still works but scans every feature.

Attribute filtering

Use OGR SQL expressions to filter by properties:

with fiona.open("parcels.gpkg") as src:
    for feature in src.filter(where="area_sqm > 5000 AND zone = 'residential'"):
        process(feature)

This pushes the filter into OGR, avoiding deserialization of non-matching features.

Multi-layer files

GeoPackage and FileGDB support multiple layers in a single file:

# List layers
print(fiona.listlayers("city_data.gpkg"))
# ['buildings', 'roads', 'parks']

# Open a specific layer
with fiona.open("city_data.gpkg", layer="roads") as src:
    road_features = list(src)

Memory-based I/O with `BytesCollection`

Process features entirely in memory without touching the filesystem:

from fiona.io import MemoryFile

# Write to memory buffer
with MemoryFile() as memfile:
    with memfile.open(driver="GeoJSON", schema=schema, crs=crs) as dst:
        for f in features:
            dst.write(f)
    # Get the bytes
    geojson_bytes = memfile.getbuffer()

This is useful when generating vector data for API responses or message queues.

Error handling patterns

from fiona.errors import DriverError, SchemaError

try:
    with fiona.open("maybe_corrupt.shp") as src:
        features = list(src)
except DriverError as e:
    # File not found, unsupported format, corrupt header
    logger.error(f"Cannot open file: {e}")
except SchemaError as e:
    # Schema mismatch during write
    logger.error(f"Schema problem: {e}")

Handling encoding issues

Shapefiles use .cpg to declare encoding. If it is missing, Fiona defaults to ISO-8859-1, which mangles non-Latin characters. Force UTF-8 with:

with fiona.open("japanese_cities.shp", encoding="utf-8") as src:
    for feature in src:
        print(feature["properties"]["name"])

Integration with the ecosystem

Task	Fiona role	Partner library
Read shapefile into GeoDataFrame	I/O backend	GeoPandas (`gpd.read_file` calls Fiona)
Convert geometry dicts to objects	Provides dicts	Shapely (`shape(feature["geometry"])`)
Reproject coordinates	Reads source CRS	pyproj (`Transformer`)
Stream features to PostGIS	Reads source	psycopg2 / SQLAlchemy + GeoAlchemy2

Performance comparison: format choice matters

Benchmark reading 1M polygon features (typical):

Format	Read time	File size	Spatial filter speedup
Shapefile	12s	850 MB	None (sequential scan)
GeoPackage	8s	620 MB	10-50× with R-tree
FlatGeobuf	5s	580 MB	20-100× with Hilbert index
GeoJSON	25s	1.4 GB	None

FlatGeobuf is the fastest for read-heavy, single-layer workloads. GeoPackage wins when you need transactions, multiple layers, or wide tool compatibility.

The one thing to remember: Fiona’s value is abstraction — it shields you from format complexity so you can treat any vector file as a stream of Python dicts, but production performance depends on choosing the right driver and pushing filters into the OGR layer.

pythonfionageospatialvector-data