Python for FHIR Health Data — Deep Dive
FHIR architecture for Python developers
FHIR R4 (the current normative release) defines a REST API over JSON resources. Understanding its design decisions helps you build robust integrations rather than fragile screen-scrapers.
Resource identity and versioning
Every FHIR resource has a logical ID and a version ID. The server assigns both:
GET /Patient/123/_history/4
This returns version 4 of Patient 123. The meta.versionId and meta.lastUpdated fields track changes. For optimistic concurrency, include If-Match: W/"4" on updates — the server rejects the write if someone else modified the resource since you read it.
References and graph traversal
Resources reference each other by URL. An Observation points to its Patient:
{
"resourceType": "Observation",
"subject": {"reference": "Patient/123"},
"code": {"coding": [{"system": "http://loinc.org", "code": "85354-9"}]}
}
To resolve references efficiently, use _include and _revinclude search parameters instead of making N+1 requests:
import httpx
response = httpx.get(
f"{base}/Observation",
params={
"patient": "123",
"_include": "Observation:performer",
"_revinclude": "Provenance:target"
}
)
bundle = response.json()
# bundle contains Observations + referenced Practitioners + Provenance records
Building a FHIR client with validation
Pydantic-powered resource handling
The fhir.resources library generates Pydantic v2 models from FHIR StructureDefinitions. This gives you runtime validation, serialization, and IDE support:
from fhir.resources.observation import Observation
from fhir.resources.quantity import Quantity
from fhir.resources.codeableconcept import CodeableConcept
from fhir.resources.coding import Coding
from fhir.resources.reference import Reference
from datetime import datetime
obs = Observation(
status="final",
code=CodeableConcept(coding=[
Coding(system="http://loinc.org", code="8867-4", display="Heart rate")
]),
subject=Reference(reference="Patient/123"),
effectiveDateTime=datetime.now().isoformat(),
valueQuantity=Quantity(value=72, unit="beats/min", system="http://unitsofmeasure.org", code="/min")
)
# Serialize to FHIR-compliant JSON
fhir_json = obs.json(exclude_none=True)
Custom validation rules
FHIR profiles constrain base resources for specific use cases. US Core (mandated for US healthcare) requires Patient resources to include race, ethnicity, and birth sex extensions. Validate against profiles using fhir.resources validators or the external fhir-validator CLI:
from fhir.resources.patient import Patient
from pydantic import ValidationError
def validate_us_core_patient(data: dict) -> list[str]:
errors = []
try:
patient = Patient.parse_obj(data)
except ValidationError as e:
return [str(err) for err in e.errors()]
# US Core requires at least one identifier
if not patient.identifier:
errors.append("US Core: Patient must have at least one identifier")
# Check for race extension
race_url = "http://hl7.org/fhir/us/core/StructureDefinition/us-core-race"
has_race = any(
ext.url == race_url for ext in (patient.extension or [])
)
if not has_race:
errors.append("US Core: Patient should include race extension")
return errors
Bulk Data Export
For population health analytics, querying one patient at a time is impractical. FHIR Bulk Data Access (the “flat FHIR” spec) exports entire datasets asynchronously:
import httpx
import time
import ndjson
async def bulk_export(base_url: str, token: str) -> list[dict]:
headers = {
"Authorization": f"Bearer {token}",
"Accept": "application/fhir+json",
"Prefer": "respond-async"
}
# Kick off export
async with httpx.AsyncClient() as client:
response = await client.get(
f"{base_url}/Patient/$export",
params={"_type": "Patient,Observation,Condition"},
headers=headers
)
poll_url = response.headers["Content-Location"]
# Poll for completion
while True:
status = await client.get(poll_url, headers=headers)
if status.status_code == 200:
break
await asyncio.sleep(int(status.headers.get("Retry-After", 10)))
# Download NDJSON files
manifest = status.json()
all_resources = []
for output in manifest["output"]:
data = await client.get(output["url"], headers=headers)
resources = ndjson.loads(data.text)
all_resources.extend(resources)
return all_resources
The export produces NDJSON (newline-delimited JSON) files — one resource per line — which load efficiently into pandas, DuckDB, or Spark for analysis.
CDS Hooks — clinical decision support
CDS Hooks let your Python service inject recommendations into a clinician’s workflow. When a doctor opens a patient chart or signs an order, the EHR calls your webhook:
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class CDSRequest(BaseModel):
hookInstance: str
hook: str
context: dict
prefetch: dict | None = None
class Card(BaseModel):
summary: str
detail: str
indicator: str # "info", "warning", "critical"
source: dict
@app.post("/cds-services/drug-interaction-check")
async def drug_interaction(request: CDSRequest):
medications = request.context.get("medications", [])
interactions = check_interactions(medications) # your logic
cards = []
for interaction in interactions:
cards.append(Card(
summary=f"Potential interaction: {interaction.drug_a} + {interaction.drug_b}",
detail=interaction.description,
indicator="warning",
source={"label": "Drug Interaction Service"}
))
return {"cards": [c.dict() for c in cards]}
The EHR displays returned cards as inline alerts. This pattern powers real-time allergy checks, duplicate order detection, and clinical guideline reminders.
SMART on FHIR authorization — implementation details
Backend confidential client flow
For server-to-server access (background jobs, analytics pipelines), use the SMART Backend Services flow with asymmetric JWT authentication:
import jwt
import httpx
import time
from cryptography.hazmat.primitives import serialization
def get_backend_token(token_url: str, client_id: str, private_key_path: str) -> str:
with open(private_key_path, "rb") as f:
private_key = serialization.load_pem_private_key(f.read(), password=None)
now = int(time.time())
claims = {
"iss": client_id,
"sub": client_id,
"aud": token_url,
"exp": now + 300,
"jti": str(uuid.uuid4())
}
assertion = jwt.encode(claims, private_key, algorithm="RS384")
response = httpx.post(token_url, data={
"grant_type": "client_credentials",
"scope": "system/*.read",
"client_assertion_type": "urn:ietf:params:oauth:client-assertion-type:jwt-bearer",
"client_assertion": assertion
})
return response.json()["access_token"]
Token refresh and retry
Production FHIR clients must handle token expiration gracefully:
class FHIRSession:
def __init__(self, base_url: str, token_provider):
self.base_url = base_url
self.token_provider = token_provider
self.client = httpx.Client()
self._refresh_token()
def _refresh_token(self):
self.token = self.token_provider()
self.client.headers["Authorization"] = f"Bearer {self.token}"
def get(self, path: str, **kwargs) -> dict:
response = self.client.get(f"{self.base_url}/{path}", **kwargs)
if response.status_code == 401:
self._refresh_token()
response = self.client.get(f"{self.base_url}/{path}", **kwargs)
response.raise_for_status()
return response.json()
Data pipeline: FHIR to analytics
A common pattern extracts FHIR data into a tabular format for analysis:
import pandas as pd
def observations_to_dataframe(bundle: dict) -> pd.DataFrame:
rows = []
for entry in bundle.get("entry", []):
obs = entry["resource"]
row = {
"patient_id": obs.get("subject", {}).get("reference", "").split("/")[-1],
"code": obs.get("code", {}).get("coding", [{}])[0].get("code"),
"display": obs.get("code", {}).get("coding", [{}])[0].get("display"),
"value": obs.get("valueQuantity", {}).get("value"),
"unit": obs.get("valueQuantity", {}).get("unit"),
"date": obs.get("effectiveDateTime"),
"status": obs.get("status"),
}
rows.append(row)
return pd.DataFrame(rows)
For large-scale analytics, pipe Bulk Data Export NDJSON directly into DuckDB:
import duckdb
con = duckdb.connect()
con.execute("""
CREATE TABLE observations AS
SELECT
json_extract_string(resource, '$.subject.reference') as patient_ref,
json_extract_string(resource, '$.code.coding[0].code') as loinc_code,
json_extract(resource, '$.valueQuantity.value')::DOUBLE as value,
json_extract_string(resource, '$.effectiveDateTime')::TIMESTAMP as effective_date
FROM read_ndjson_auto('Observation.ndjson', columns={resource: 'JSON'})
""")
Tradeoffs and pitfalls
| Challenge | Impact | Mitigation |
|---|---|---|
| FHIR servers vary in completeness | Search parameters, operations differ | Test against the specific server; use capability statement |
| Large bundles hit pagination | Default page size is often 20 | Follow bundle.link with relation: "next" |
| Extension sprawl | US Core, mCODE, IPS all add extensions | Pin the profile version; validate strictly |
| FHIR != complete medical record | Narrative notes, images may be outside FHIR | Use DocumentReference to locate non-structured data |
| Date/time ambiguity | Some servers return dates without timezone | Normalize to UTC in your pipeline |
The one thing to remember: Production FHIR integrations in Python combine Pydantic-validated resource models, SMART on FHIR authorization, and bulk export pipelines — but success depends on understanding that every FHIR server implements the spec slightly differently, so test against your actual deployment target early and often.
See Also
- Python Electronic Health Records How Python helps hospitals organize and learn from millions of patient records to improve healthcare for everyone.