Python for FHIR Health Data — Deep Dive

Build production FHIR integrations in Python with resource validation, bulk data export, CDS Hooks, and SMART on FHIR authorization patterns.

FHIR architecture for Python developers

FHIR R4 (the current normative release) defines a REST API over JSON resources. Understanding its design decisions helps you build robust integrations rather than fragile screen-scrapers.

Resource identity and versioning

Every FHIR resource has a logical ID and a version ID. The server assigns both:

GET /Patient/123/_history/4

This returns version 4 of Patient 123. The meta.versionId and meta.lastUpdated fields track changes. For optimistic concurrency, include If-Match: W/"4" on updates — the server rejects the write if someone else modified the resource since you read it.

References and graph traversal

Resources reference each other by URL. An Observation points to its Patient:

{
  "resourceType": "Observation",
  "subject": {"reference": "Patient/123"},
  "code": {"coding": [{"system": "http://loinc.org", "code": "85354-9"}]}
}

To resolve references efficiently, use _include and _revinclude search parameters instead of making N+1 requests:

import httpx

response = httpx.get(
    f"{base}/Observation",
    params={
        "patient": "123",
        "_include": "Observation:performer",
        "_revinclude": "Provenance:target"
    }
)
bundle = response.json()
# bundle contains Observations + referenced Practitioners + Provenance records

Building a FHIR client with validation

Pydantic-powered resource handling

The fhir.resources library generates Pydantic v2 models from FHIR StructureDefinitions. This gives you runtime validation, serialization, and IDE support:

from fhir.resources.observation import Observation
from fhir.resources.quantity import Quantity
from fhir.resources.codeableconcept import CodeableConcept
from fhir.resources.coding import Coding
from fhir.resources.reference import Reference
from datetime import datetime

obs = Observation(
    status="final",
    code=CodeableConcept(coding=[
        Coding(system="http://loinc.org", code="8867-4", display="Heart rate")
    ]),
    subject=Reference(reference="Patient/123"),
    effectiveDateTime=datetime.now().isoformat(),
    valueQuantity=Quantity(value=72, unit="beats/min", system="http://unitsofmeasure.org", code="/min")
)

# Serialize to FHIR-compliant JSON
fhir_json = obs.json(exclude_none=True)

Custom validation rules

FHIR profiles constrain base resources for specific use cases. US Core (mandated for US healthcare) requires Patient resources to include race, ethnicity, and birth sex extensions. Validate against profiles using fhir.resources validators or the external fhir-validator CLI:

from fhir.resources.patient import Patient
from pydantic import ValidationError

def validate_us_core_patient(data: dict) -> list[str]:
    errors = []
    try:
        patient = Patient.parse_obj(data)
    except ValidationError as e:
        return [str(err) for err in e.errors()]
    
    # US Core requires at least one identifier
    if not patient.identifier:
        errors.append("US Core: Patient must have at least one identifier")
    
    # Check for race extension
    race_url = "http://hl7.org/fhir/us/core/StructureDefinition/us-core-race"
    has_race = any(
        ext.url == race_url for ext in (patient.extension or [])
    )
    if not has_race:
        errors.append("US Core: Patient should include race extension")
    
    return errors

Bulk Data Export

For population health analytics, querying one patient at a time is impractical. FHIR Bulk Data Access (the “flat FHIR” spec) exports entire datasets asynchronously:

import httpx
import time
import ndjson

async def bulk_export(base_url: str, token: str) -> list[dict]:
    headers = {
        "Authorization": f"Bearer {token}",
        "Accept": "application/fhir+json",
        "Prefer": "respond-async"
    }
    
    # Kick off export
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f"{base_url}/Patient/$export",
            params={"_type": "Patient,Observation,Condition"},
            headers=headers
        )
        poll_url = response.headers["Content-Location"]
        
        # Poll for completion
        while True:
            status = await client.get(poll_url, headers=headers)
            if status.status_code == 200:
                break
            await asyncio.sleep(int(status.headers.get("Retry-After", 10)))
        
        # Download NDJSON files
        manifest = status.json()
        all_resources = []
        for output in manifest["output"]:
            data = await client.get(output["url"], headers=headers)
            resources = ndjson.loads(data.text)
            all_resources.extend(resources)
        
        return all_resources

The export produces NDJSON (newline-delimited JSON) files — one resource per line — which load efficiently into pandas, DuckDB, or Spark for analysis.

CDS Hooks — clinical decision support

CDS Hooks let your Python service inject recommendations into a clinician’s workflow. When a doctor opens a patient chart or signs an order, the EHR calls your webhook:

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class CDSRequest(BaseModel):
    hookInstance: str
    hook: str
    context: dict
    prefetch: dict | None = None

class Card(BaseModel):
    summary: str
    detail: str
    indicator: str  # "info", "warning", "critical"
    source: dict

@app.post("/cds-services/drug-interaction-check")
async def drug_interaction(request: CDSRequest):
    medications = request.context.get("medications", [])
    interactions = check_interactions(medications)  # your logic
    
    cards = []
    for interaction in interactions:
        cards.append(Card(
            summary=f"Potential interaction: {interaction.drug_a} + {interaction.drug_b}",
            detail=interaction.description,
            indicator="warning",
            source={"label": "Drug Interaction Service"}
        ))
    
    return {"cards": [c.dict() for c in cards]}

The EHR displays returned cards as inline alerts. This pattern powers real-time allergy checks, duplicate order detection, and clinical guideline reminders.

SMART on FHIR authorization — implementation details

Backend confidential client flow

For server-to-server access (background jobs, analytics pipelines), use the SMART Backend Services flow with asymmetric JWT authentication:

import jwt
import httpx
import time
from cryptography.hazmat.primitives import serialization

def get_backend_token(token_url: str, client_id: str, private_key_path: str) -> str:
    with open(private_key_path, "rb") as f:
        private_key = serialization.load_pem_private_key(f.read(), password=None)
    
    now = int(time.time())
    claims = {
        "iss": client_id,
        "sub": client_id,
        "aud": token_url,
        "exp": now + 300,
        "jti": str(uuid.uuid4())
    }
    assertion = jwt.encode(claims, private_key, algorithm="RS384")
    
    response = httpx.post(token_url, data={
        "grant_type": "client_credentials",
        "scope": "system/*.read",
        "client_assertion_type": "urn:ietf:params:oauth:client-assertion-type:jwt-bearer",
        "client_assertion": assertion
    })
    return response.json()["access_token"]

Token refresh and retry

Production FHIR clients must handle token expiration gracefully:

class FHIRSession:
    def __init__(self, base_url: str, token_provider):
        self.base_url = base_url
        self.token_provider = token_provider
        self.client = httpx.Client()
        self._refresh_token()
    
    def _refresh_token(self):
        self.token = self.token_provider()
        self.client.headers["Authorization"] = f"Bearer {self.token}"
    
    def get(self, path: str, **kwargs) -> dict:
        response = self.client.get(f"{self.base_url}/{path}", **kwargs)
        if response.status_code == 401:
            self._refresh_token()
            response = self.client.get(f"{self.base_url}/{path}", **kwargs)
        response.raise_for_status()
        return response.json()

Data pipeline: FHIR to analytics

A common pattern extracts FHIR data into a tabular format for analysis:

import pandas as pd

def observations_to_dataframe(bundle: dict) -> pd.DataFrame:
    rows = []
    for entry in bundle.get("entry", []):
        obs = entry["resource"]
        row = {
            "patient_id": obs.get("subject", {}).get("reference", "").split("/")[-1],
            "code": obs.get("code", {}).get("coding", [{}])[0].get("code"),
            "display": obs.get("code", {}).get("coding", [{}])[0].get("display"),
            "value": obs.get("valueQuantity", {}).get("value"),
            "unit": obs.get("valueQuantity", {}).get("unit"),
            "date": obs.get("effectiveDateTime"),
            "status": obs.get("status"),
        }
        rows.append(row)
    return pd.DataFrame(rows)

For large-scale analytics, pipe Bulk Data Export NDJSON directly into DuckDB:

import duckdb

con = duckdb.connect()
con.execute("""
    CREATE TABLE observations AS
    SELECT 
        json_extract_string(resource, '$.subject.reference') as patient_ref,
        json_extract_string(resource, '$.code.coding[0].code') as loinc_code,
        json_extract(resource, '$.valueQuantity.value')::DOUBLE as value,
        json_extract_string(resource, '$.effectiveDateTime')::TIMESTAMP as effective_date
    FROM read_ndjson_auto('Observation.ndjson', columns={resource: 'JSON'})
""")

Tradeoffs and pitfalls

Challenge	Impact	Mitigation
FHIR servers vary in completeness	Search parameters, operations differ	Test against the specific server; use capability statement
Large bundles hit pagination	Default page size is often 20	Follow `bundle.link` with `relation: "next"`
Extension sprawl	US Core, mCODE, IPS all add extensions	Pin the profile version; validate strictly
FHIR != complete medical record	Narrative notes, images may be outside FHIR	Use DocumentReference to locate non-structured data
Date/time ambiguity	Some servers return dates without timezone	Normalize to UTC in your pipeline

The one thing to remember: Production FHIR integrations in Python combine Pydantic-validated resource models, SMART on FHIR authorization, and bulk export pipelines — but success depends on understanding that every FHIR server implements the spec slightly differently, so test against your actual deployment target early and often.

pythonhealthcareinteroperability