GeoPandas Spatial Data — Core Concepts
GeoPandas extends Pandas DataFrames with a geometry column, enabling spatial operations alongside standard tabular analysis. If you know Pandas, you already know 80% of GeoPandas — the extra 20% is the geography.
GeoDataFrame basics
import geopandas as gpd
# Read a shapefile, GeoJSON, GeoPackage, or any GDAL-supported format
gdf = gpd.read_file("neighborhoods.geojson")
print(type(gdf)) # GeoDataFrame
print(gdf.columns.tolist()) # [..., "name", "population", "geometry"]
print(gdf.crs) # EPSG:4326 (WGS-84 lat/lng)
The geometry column holds Shapely objects — Points, LineStrings, or Polygons. Every Pandas operation (filtering, groupby, merge) works exactly the same; the geometry column just tags along.
Reading and writing data
# Read
gdf = gpd.read_file("data.shp") # Shapefile
gdf = gpd.read_file("data.geojson") # GeoJSON
gdf = gpd.read_file("data.gpkg", layer="buildings") # GeoPackage
# Write
gdf.to_file("output.geojson", driver="GeoJSON")
gdf.to_file("output.gpkg", driver="GPKG")
gdf.to_parquet("output.parquet") # GeoParquet — fast columnar format
GeoParquet is the recommended format for large datasets. It is 5–10× faster to read than GeoJSON and supports efficient column-level filtering.
Coordinate reference systems (CRS)
Every GeoDataFrame has a CRS that defines what the coordinates mean. WGS-84 (EPSG:4326) uses degrees; UTM projections use meters.
# Check current CRS
print(gdf.crs) # EPSG:4326
# Reproject to UTM zone 33N (meters) for accurate area/distance
gdf_utm = gdf.to_crs(epsg=32633)
gdf_utm["area_km2"] = gdf_utm.area / 1e6
Rule of thumb: Use EPSG:4326 for storage and display. Project to a local CRS (UTM, State Plane) for measurements.
Spatial operations
Filtering by location
# Points within a polygon
city_boundary = gpd.read_file("city.geojson").geometry[0]
stores_in_city = stores_gdf[stores_gdf.within(city_boundary)]
Buffering
# 500-meter buffer around each school (requires projected CRS)
schools_utm = schools_gdf.to_crs(epsg=32633)
schools_utm["buffer"] = schools_utm.geometry.buffer(500)
Area and length
gdf_utm["area_m2"] = gdf_utm.area
gdf_utm["perimeter_m"] = gdf_utm.length
Spatial joins
The most powerful operation in GeoPandas. It joins two GeoDataFrames based on spatial relationships instead of column values:
# Which neighborhood is each restaurant in?
restaurants_with_hoods = gpd.sjoin(
restaurants_gdf, neighborhoods_gdf,
how="left", predicate="within"
)
Predicates: within, contains, intersects, crosses, touches.
Point-in-polygon aggregation
# Count restaurants per neighborhood
counts = gpd.sjoin(restaurants_gdf, neighborhoods_gdf, predicate="within") \
.groupby("neighborhood_name").size() \
.reset_index(name="restaurant_count")
This pattern — spatial join followed by groupby — is the geographic equivalent of a SQL GROUP BY with a JOIN.
Plotting
GeoPandas integrates with Matplotlib for quick visualization:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
neighborhoods_gdf.plot(
column="population",
cmap="YlOrRd",
legend=True,
ax=ax,
)
stores_gdf.plot(ax=ax, color="blue", markersize=5)
ax.set_title("Stores by Neighborhood Population")
plt.show()
For interactive maps, convert to GeoJSON and pass to Folium, or use the explore() method which creates a Leaflet map directly:
gdf.explore(column="population", cmap="YlOrRd", tooltip=["name", "population"])
Common misconception
Beginners often compute areas or distances on unprojected data (EPSG:4326) and get results in “square degrees” — meaningless numbers. GeoPandas will not warn you. Always check gdf.crs before measuring, and reproject to a meters-based CRS if needed.
When to use GeoPandas vs. raw Shapely
Shapely handles individual geometries and their math. GeoPandas handles tables of geometries with attributes. If you are working with a single polygon, Shapely is enough. If you have a CSV of 50,000 locations with associated data, GeoPandas is the right tool.
The one thing to remember: GeoPandas makes geographic data feel like a normal Pandas DataFrame — filter, group, join, and plot — while the geometry column handles the spatial math transparently.
See Also
- Python Adaptive Learning Systems How Python builds learning apps that adjust to each student like a personal tutor who knows exactly what you need next.
- Python Airflow Learn Airflow as a timetable manager that makes sure data tasks run in the right order every day.
- Python Altair Learn Altair through the idea of drawing charts by describing rules, not by hand-placing every visual element.
- Python Automated Grading How Python grades homework and exams automatically, from simple answer keys to understanding written essays.
- Python Batch Vs Stream Processing Batch processing is like doing laundry once a week; stream processing is like a self-cleaning shirt that cleans itself constantly.