GeoPandas Spatial Data — Core Concepts

GeoPandas extends Pandas DataFrames with a geometry column, enabling spatial operations alongside standard tabular analysis. If you know Pandas, you already know 80% of GeoPandas — the extra 20% is the geography.

GeoDataFrame basics

import geopandas as gpd

# Read a shapefile, GeoJSON, GeoPackage, or any GDAL-supported format
gdf = gpd.read_file("neighborhoods.geojson")

print(type(gdf))            # GeoDataFrame
print(gdf.columns.tolist()) # [..., "name", "population", "geometry"]
print(gdf.crs)              # EPSG:4326 (WGS-84 lat/lng)

The geometry column holds Shapely objects — Points, LineStrings, or Polygons. Every Pandas operation (filtering, groupby, merge) works exactly the same; the geometry column just tags along.

Reading and writing data

# Read
gdf = gpd.read_file("data.shp")                    # Shapefile
gdf = gpd.read_file("data.geojson")                 # GeoJSON
gdf = gpd.read_file("data.gpkg", layer="buildings") # GeoPackage

# Write
gdf.to_file("output.geojson", driver="GeoJSON")
gdf.to_file("output.gpkg", driver="GPKG")
gdf.to_parquet("output.parquet")  # GeoParquet — fast columnar format

GeoParquet is the recommended format for large datasets. It is 5–10× faster to read than GeoJSON and supports efficient column-level filtering.

Coordinate reference systems (CRS)

Every GeoDataFrame has a CRS that defines what the coordinates mean. WGS-84 (EPSG:4326) uses degrees; UTM projections use meters.

# Check current CRS
print(gdf.crs)  # EPSG:4326

# Reproject to UTM zone 33N (meters) for accurate area/distance
gdf_utm = gdf.to_crs(epsg=32633)
gdf_utm["area_km2"] = gdf_utm.area / 1e6

Rule of thumb: Use EPSG:4326 for storage and display. Project to a local CRS (UTM, State Plane) for measurements.

Spatial operations

Filtering by location

# Points within a polygon
city_boundary = gpd.read_file("city.geojson").geometry[0]
stores_in_city = stores_gdf[stores_gdf.within(city_boundary)]

Buffering

# 500-meter buffer around each school (requires projected CRS)
schools_utm = schools_gdf.to_crs(epsg=32633)
schools_utm["buffer"] = schools_utm.geometry.buffer(500)

Area and length

gdf_utm["area_m2"] = gdf_utm.area
gdf_utm["perimeter_m"] = gdf_utm.length

Spatial joins

The most powerful operation in GeoPandas. It joins two GeoDataFrames based on spatial relationships instead of column values:

# Which neighborhood is each restaurant in?
restaurants_with_hoods = gpd.sjoin(
    restaurants_gdf, neighborhoods_gdf,
    how="left", predicate="within"
)

Predicates: within, contains, intersects, crosses, touches.

Point-in-polygon aggregation

# Count restaurants per neighborhood
counts = gpd.sjoin(restaurants_gdf, neighborhoods_gdf, predicate="within") \
    .groupby("neighborhood_name").size() \
    .reset_index(name="restaurant_count")

This pattern — spatial join followed by groupby — is the geographic equivalent of a SQL GROUP BY with a JOIN.

Plotting

GeoPandas integrates with Matplotlib for quick visualization:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1, figsize=(10, 10))
neighborhoods_gdf.plot(
    column="population",
    cmap="YlOrRd",
    legend=True,
    ax=ax,
)
stores_gdf.plot(ax=ax, color="blue", markersize=5)
ax.set_title("Stores by Neighborhood Population")
plt.show()

For interactive maps, convert to GeoJSON and pass to Folium, or use the explore() method which creates a Leaflet map directly:

gdf.explore(column="population", cmap="YlOrRd", tooltip=["name", "population"])

Common misconception

Beginners often compute areas or distances on unprojected data (EPSG:4326) and get results in “square degrees” — meaningless numbers. GeoPandas will not warn you. Always check gdf.crs before measuring, and reproject to a meters-based CRS if needed.

When to use GeoPandas vs. raw Shapely

Shapely handles individual geometries and their math. GeoPandas handles tables of geometries with attributes. If you are working with a single polygon, Shapely is enough. If you have a CSV of 50,000 locations with associated data, GeoPandas is the right tool.

The one thing to remember: GeoPandas makes geographic data feel like a normal Pandas DataFrame — filter, group, join, and plot — while the geometry column handles the spatial math transparently.

pythongeopandasgeospatialdata-analysis

See Also

  • Python Adaptive Learning Systems How Python builds learning apps that adjust to each student like a personal tutor who knows exactly what you need next.
  • Python Airflow Learn Airflow as a timetable manager that makes sure data tasks run in the right order every day.
  • Python Altair Learn Altair through the idea of drawing charts by describing rules, not by hand-placing every visual element.
  • Python Automated Grading How Python grades homework and exams automatically, from simple answer keys to understanding written essays.
  • Python Batch Vs Stream Processing Batch processing is like doing laundry once a week; stream processing is like a self-cleaning shirt that cleans itself constantly.