Python Carbon Footprint Tracking — Deep Dive
Technical foundation
Carbon footprint tracking systems in Python must handle heterogeneous data sources, apply region-specific emission factors, handle unit conversions, manage temporal granularity, and produce audit-ready reports. This deep dive covers the architecture patterns, calculation engines, and deployment considerations for production carbon accounting platforms.
Data model design
A robust carbon tracking system starts with a well-designed data model that separates activity data from emission factors:
from dataclasses import dataclass, field
from datetime import date
from enum import Enum
from decimal import Decimal
class Scope(Enum):
SCOPE_1 = 1
SCOPE_2 = 2
SCOPE_3 = 3
class Scope3Category(Enum):
PURCHASED_GOODS = 1
CAPITAL_GOODS = 2
FUEL_ENERGY = 3
UPSTREAM_TRANSPORT = 4
WASTE = 5
BUSINESS_TRAVEL = 6
EMPLOYEE_COMMUTING = 7
UPSTREAM_LEASED = 8
DOWNSTREAM_TRANSPORT = 9
PROCESSING_SOLD = 10
USE_SOLD = 11
END_OF_LIFE = 12
DOWNSTREAM_LEASED = 13
FRANCHISES = 14
INVESTMENTS = 15
@dataclass
class ActivityRecord:
entity_id: str # Business unit, facility, or project
scope: Scope
category: str # e.g., "electricity", "diesel", "flight"
quantity: Decimal
unit: str # kWh, liters, km, USD
period_start: date
period_end: date
source: str # Data provenance (invoice, API, estimate)
confidence: str # "measured", "calculated", "estimated"
scope3_category: Scope3Category | None = None
metadata: dict = field(default_factory=dict)
@dataclass
class EmissionFactor:
category: str
region: str # ISO country or grid region
factor: Decimal # kg CO2e per unit
unit: str # Unit the factor applies to
source: str # Database name and version
valid_from: date
valid_to: date
gas_breakdown: dict | None = None # CO2, CH4, N2O separately
The confidence field is critical for audit trails. Regulators and verifiers distinguish between measured data (highest confidence), calculated data (medium), and estimated data (lowest).
Emission factor management
Production systems need a factor database that handles versioning, regional specificity, and temporal validity:
import pandas as pd
from pathlib import Path
class EmissionFactorDB:
"""Manage emission factors from multiple sources."""
def __init__(self, data_dir: Path):
self.factors = self._load_factors(data_dir)
def _load_factors(self, data_dir: Path) -> pd.DataFrame:
frames = []
for source_file in data_dir.glob("*.csv"):
df = pd.read_csv(source_file, parse_dates=["valid_from", "valid_to"])
df["source_file"] = source_file.stem
frames.append(df)
return pd.concat(frames, ignore_index=True)
def get_factor(
self, category: str, region: str, period: date,
prefer_source: str | None = None,
) -> EmissionFactor | None:
"""Look up the best matching emission factor."""
mask = (
(self.factors["category"] == category)
& (self.factors["valid_from"] <= pd.Timestamp(period))
& (self.factors["valid_to"] >= pd.Timestamp(period))
)
# Try exact region match first, then fall back to country, then global
for region_candidate in [region, region[:2], "GLOBAL"]:
region_mask = mask & (self.factors["region"] == region_candidate)
matches = self.factors[region_mask]
if not matches.empty:
if prefer_source and prefer_source in matches["source_file"].values:
row = matches[matches["source_file"] == prefer_source].iloc[0]
else:
row = matches.iloc[0]
return EmissionFactor(
category=row["category"],
region=row["region"],
factor=Decimal(str(row["factor_kg_co2e"])),
unit=row["unit"],
source=row["source_file"],
valid_from=row["valid_from"].date(),
valid_to=row["valid_to"].date(),
)
return None
Key emission factor databases to integrate:
- DEFRA (UK Government) — Comprehensive factors for fuel, electricity, freight, travel. Updated annually.
- EPA GHGRP (US) — Facility-level emissions for US industrial sites.
- IEA — Country-level electricity emission factors.
- ecoinvent — Life-cycle factors for materials (requires license).
- Climatiq API — Cloud-hosted factor database with REST API access.
Calculation engine
The calculation engine applies factors to activity data with proper unit handling:
from decimal import Decimal, ROUND_HALF_UP
# Unit conversion registry
UNIT_CONVERSIONS = {
("gallons_us", "liters"): Decimal("3.78541"),
("miles", "km"): Decimal("1.60934"),
("therms", "kwh"): Decimal("29.3001"),
("mmbtu", "kwh"): Decimal("293.071"),
("short_tons", "tonnes"): Decimal("0.907185"),
}
def convert_units(value: Decimal, from_unit: str, to_unit: str) -> Decimal:
if from_unit == to_unit:
return value
key = (from_unit, to_unit)
if key in UNIT_CONVERSIONS:
return value * UNIT_CONVERSIONS[key]
reverse = (to_unit, from_unit)
if reverse in UNIT_CONVERSIONS:
return value / UNIT_CONVERSIONS[reverse]
raise ValueError(f"No conversion from {from_unit} to {to_unit}")
def calculate_emissions(
activity: ActivityRecord,
factor_db: EmissionFactorDB,
) -> dict:
"""Calculate CO2e emissions for a single activity record."""
factor = factor_db.get_factor(
activity.category,
activity.metadata.get("region", "GLOBAL"),
activity.period_start,
)
if factor is None:
return {
"activity_id": id(activity),
"status": "error",
"message": f"No factor for {activity.category} in {activity.metadata.get('region')}",
}
# Convert activity units to match factor units if needed
quantity = activity.quantity
if activity.unit != factor.unit:
try:
quantity = convert_units(quantity, activity.unit, factor.unit)
except ValueError as e:
return {"status": "error", "message": str(e)}
emissions_kg = (quantity * factor.factor).quantize(Decimal("0.01"), ROUND_HALF_UP)
return {
"scope": activity.scope.name,
"category": activity.category,
"quantity": float(quantity),
"unit": factor.unit,
"factor": float(factor.factor),
"factor_source": factor.source,
"emissions_kg_co2e": float(emissions_kg),
"emissions_tonnes_co2e": float(emissions_kg / 1000),
"confidence": activity.confidence,
"period": f"{activity.period_start} to {activity.period_end}",
}
Scope 2 dual reporting
Implementing both location-based and market-based Scope 2:
def calculate_scope2(
electricity_kwh: Decimal,
region: str,
period: date,
factor_db: EmissionFactorDB,
renewable_percentage: Decimal = Decimal("0"),
rec_factor: Decimal = Decimal("0"), # Market instrument factor
) -> dict:
"""Calculate Scope 2 using both methods."""
# Location-based: average grid factor
grid_factor = factor_db.get_factor("electricity_grid", region, period)
location_based = electricity_kwh * grid_factor.factor / 1000 # tonnes
# Market-based: factor from contractual instruments
non_renewable_kwh = electricity_kwh * (1 - renewable_percentage / 100)
residual_factor = factor_db.get_factor("electricity_residual_mix", region, period)
if residual_factor:
market_based = (
non_renewable_kwh * residual_factor.factor
+ electricity_kwh * renewable_percentage / 100 * rec_factor
) / 1000
else:
market_based = location_based # Fallback if no residual mix available
return {
"location_based_tonnes": float(location_based),
"market_based_tonnes": float(market_based),
"grid_factor_source": grid_factor.source,
"renewable_percentage": float(renewable_percentage),
}
Business travel calculations
Flight emissions require distance calculation and class-specific factors:
from math import radians, sin, cos, asin, sqrt
def haversine_km(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""Great-circle distance between two coordinates."""
R = 6371 # Earth radius in km
lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
dlat = lat2 - lat1
dlon = lon2 - lon1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
return 2 * R * asin(sqrt(a))
# DEFRA 2024 flight factors (kg CO2e per passenger-km including radiative forcing)
FLIGHT_FACTORS = {
"domestic": {"economy": 0.246, "business": 0.369},
"short_haul": {"economy": 0.151, "business": 0.226},
"long_haul": {"economy": 0.148, "premium_economy": 0.237,
"business": 0.429, "first": 0.591},
}
def flight_emissions(
distance_km: float, cabin_class: str, haul_type: str,
) -> float:
"""Calculate flight emissions in kg CO2e."""
factor = FLIGHT_FACTORS[haul_type][cabin_class]
# Add 8% uplift for non-direct routing (DEFRA recommendation)
effective_distance = distance_km * 1.08
return effective_distance * factor
Automated data collection pipeline
Production systems pull data from multiple sources on schedule:
from datetime import datetime
import httpx
class CarbonDataPipeline:
"""Orchestrate carbon data collection from multiple sources."""
def __init__(self, config: dict):
self.collectors = {
"electricity": ElectricityCollector(config["utility_api"]),
"fleet": FleetCollector(config["telematics_api"]),
"travel": TravelCollector(config["expense_api"]),
"procurement": ProcurementCollector(config["erp_api"]),
}
async def collect_period(self, start: date, end: date) -> list[ActivityRecord]:
"""Collect all activity data for a reporting period."""
all_records = []
for source_name, collector in self.collectors.items():
try:
records = await collector.fetch(start, end)
all_records.extend(records)
print(f"Collected {len(records)} records from {source_name}")
except Exception as e:
print(f"Error collecting from {source_name}: {e}")
# Log error but continue — partial data is better than none
return all_records
class ElectricityCollector:
"""Collect electricity consumption from utility API."""
def __init__(self, api_config: dict):
self.base_url = api_config["url"]
self.api_key = api_config["key"]
async def fetch(self, start: date, end: date) -> list[ActivityRecord]:
async with httpx.AsyncClient() as client:
resp = await client.get(
f"{self.base_url}/consumption",
params={"start": start.isoformat(), "end": end.isoformat()},
headers={"Authorization": f"Bearer {self.api_key}"},
)
resp.raise_for_status()
data = resp.json()
return [
ActivityRecord(
entity_id=meter["facility_id"],
scope=Scope.SCOPE_2,
category="electricity_grid",
quantity=Decimal(str(meter["kwh"])),
unit="kWh",
period_start=start,
period_end=end,
source="utility_api",
confidence="measured",
metadata={"region": meter["grid_region"]},
)
for meter in data["meters"]
]
Reporting and visualization
Generate reports in standard disclosure formats:
def generate_cdp_summary(emissions: list[dict], year: int) -> dict:
"""Aggregate emissions into CDP disclosure format."""
import pandas as pd
df = pd.DataFrame(emissions)
summary = {
"reporting_year": year,
"scope_1_total": df[df["scope"] == "SCOPE_1"]["emissions_tonnes_co2e"].sum(),
"scope_2_location": df[
(df["scope"] == "SCOPE_2") & (df["category"].str.contains("grid"))
]["emissions_tonnes_co2e"].sum(),
"scope_3_by_category": df[df["scope"] == "SCOPE_3"].groupby("category")[
"emissions_tonnes_co2e"
].sum().to_dict(),
"data_quality": {
"measured_pct": len(df[df["confidence"] == "measured"]) / len(df) * 100,
"calculated_pct": len(df[df["confidence"] == "calculated"]) / len(df) * 100,
"estimated_pct": len(df[df["confidence"] == "estimated"]) / len(df) * 100,
},
}
return summary
Tradeoffs
| Decision | Option A | Option B |
|---|---|---|
| Factor source | Government databases (free, lower resolution) | Ecoinvent/Climatiq (paid, higher specificity) |
| Scope 3 method | Spend-based (easy, low accuracy) | Activity-based with supplier data (hard, high accuracy) |
| Temporal granularity | Annual (simple reporting) | Monthly/hourly (enables operational decisions) |
| Architecture | Spreadsheet-based (quick start) | Python pipeline (scalable, auditable, automated) |
| Verification | Self-reported | Third-party verified (required for SBTi, some regulations) |
Real-world scale
Microsoft publishes detailed carbon footprint methodology and uses Python-based pipelines to track emissions across 100,000+ employees, global data centers, and a supply chain spanning thousands of vendors. Their Scope 3 Category 1 (purchased goods) alone requires integrating emission data from major suppliers via the CDP Supply Chain program — a data integration challenge that Python orchestration handles well.
One thing to remember: Carbon accounting is fundamentally a data quality problem — the calculation is multiplication, but getting accurate activity data across an organization’s full value chain and matching it with the right emission factors requires robust data engineering that Python excels at.
See Also
- Python Building Energy Simulation Discover how Python helps architects and engineers predict a building's energy use before a single brick is laid.
- Python Climate Model Visualization See how Python turns complex climate predictions into colorful maps and charts that help everyone understand our changing planet.
- Python Energy Consumption Modeling Understand how Python helps predict and manage energy use, explained with everyday examples anyone can follow.
- Python Smart Grid Simulation Find out how Python helps engineers test the power grid of the future without risking a single blackout.
- Python Solar Panel Optimization Discover how Python helps squeeze the most electricity out of every solar panel on your roof.