Python for Clinical Trial Analysis — Deep Dive
Architecture of a clinical trial analysis pipeline
A production clinical trial analysis system manages data from collection through regulatory submission:
EDC System → Raw Data (CDISC SDTM) → Analysis Datasets (ADaM) →
Statistical Analysis → TLFs → Clinical Study Report
Python participates at every stage, from SDTM data validation to final figure generation.
CDISC data standards
Regulatory submissions require data in CDISC (Clinical Data Interchange Standards Consortium) formats:
- SDTM (Study Data Tabulation Model) — raw collected data organized into domains (DM for demographics, AE for adverse events, LB for lab results)
- ADaM (Analysis Data Model) — derived datasets optimized for statistical analysis (ADSL for subject-level, ADTTE for time-to-event)
Reading and validating SDTM with Python
import pandas as pd
from pathlib import Path
def load_sdtm_domain(study_dir: Path, domain: str) -> pd.DataFrame:
"""Load an SDTM domain from SAS transport (.xpt) file."""
filepath = study_dir / f"{domain.lower()}.xpt"
return pd.read_sas(filepath, format="xport", encoding="utf-8")
# Load key domains
dm = load_sdtm_domain(Path("study_data"), "DM") # Demographics
ae = load_sdtm_domain(Path("study_data"), "AE") # Adverse events
lb = load_sdtm_domain(Path("study_data"), "LB") # Lab results
ex = load_sdtm_domain(Path("study_data"), "EX") # Exposure (drug admin)
# Validate: every subject in AE must exist in DM
missing = set(ae["USUBJID"]) - set(dm["USUBJID"])
assert len(missing) == 0, f"Subjects in AE not in DM: {missing}"
Building ADaM datasets
The ADSL (subject-level analysis dataset) is the foundation:
def build_adsl(dm: pd.DataFrame, ex: pd.DataFrame, ds: pd.DataFrame) -> pd.DataFrame:
adsl = dm[["USUBJID", "ARM", "ARMCD", "AGE", "SEX", "RACE", "RFSTDTC"]].copy()
# Treatment duration
ex_summary = ex.groupby("USUBJID").agg(
first_dose=("EXSTDTC", "min"),
last_dose=("EXENDTC", "max"),
total_doses=("EXSEQ", "count")
).reset_index()
adsl = adsl.merge(ex_summary, on="USUBJID", how="left")
# Analysis populations
adsl["ITTFL"] = "Y" # Intent-to-treat: all randomized
adsl["SAFFL"] = adsl["total_doses"].gt(0).map({True: "Y", False: "N"})
# Disposition
completed = ds[ds["DSDECOD"] == "COMPLETED"]["USUBJID"]
adsl["COMPLFL"] = adsl["USUBJID"].isin(completed).map({True: "Y", False: "N"})
return adsl
Survival analysis — advanced techniques
Cox proportional hazards with covariates
from lifelines import CoxPHFitter
adtte = pd.read_csv("adtte.csv") # Time-to-event analysis dataset
cph = CoxPHFitter()
cph.fit(
adtte[["AVAL", "CNSR", "TRT01P_N", "AGE", "SEX_N", "ECOG_BL"]],
duration_col="AVAL",
event_col="CNSR", # Note: lifelines expects 1=event, opposite of ADaM convention
)
cph.print_summary()
# Outputs hazard ratios, confidence intervals, p-values for each covariate
Checking the proportional hazards assumption
cph.check_assumptions(adtte, p_value_threshold=0.05, show_plots=True)
If the assumption fails (the treatment effect changes over time), alternatives include:
- Time-varying coefficients —
CoxTimeVaryingFitterin lifelines - Restricted mean survival time (RMST) — compares area under survival curves up to a time horizon, assumption-free
- Piecewise Cox models — different hazard ratios for different time intervals
Competing risks
When patients can experience multiple types of events (death from disease vs death from other causes), standard Kaplan-Meier overestimates individual event probabilities. The Aalen-Johansen estimator handles competing risks:
from lifelines import AalenJohansenFitter
ajf = AalenJohansenFitter()
# event_of_interest: 1=disease death, 2=other death, 0=censored
ajf.fit(durations=adtte["AVAL"], event_observed=adtte["EVENT_TYPE"], event_of_interest=1)
ajf.plot()
Bayesian adaptive trial design
Traditional trials fix the sample size upfront. Bayesian adaptive designs allow interim analyses that can stop the trial early (for efficacy or futility) or reallocate patients to better-performing arms:
import numpy as np
from scipy.stats import beta
class BayesianAdaptiveTrial:
def __init__(self, n_arms: int, prior_alpha: float = 1, prior_beta: float = 1):
self.alphas = np.full(n_arms, prior_alpha)
self.betas = np.full(n_arms, prior_beta)
def update(self, arm: int, successes: int, failures: int):
self.alphas[arm] += successes
self.betas[arm] += failures
def posterior_prob_best(self, arm: int, n_samples: int = 10000) -> float:
"""Probability that this arm has the highest response rate."""
samples = np.array([
beta.rvs(a, b, size=n_samples)
for a, b in zip(self.alphas, self.betas)
])
return (samples[arm] == samples.max(axis=0)).mean()
def allocation_probs(self, n_samples: int = 10000) -> np.ndarray:
"""Thompson sampling: allocate proportional to posterior probability of being best."""
probs = np.array([
self.posterior_prob_best(i, n_samples)
for i in range(len(self.alphas))
])
return probs / probs.sum()
def should_stop(self, efficacy_threshold: float = 0.99,
futility_threshold: float = 0.01) -> dict:
probs = [self.posterior_prob_best(i) for i in range(len(self.alphas))]
best_arm = np.argmax(probs)
if probs[best_arm] > efficacy_threshold:
return {"stop": True, "reason": "efficacy", "best_arm": best_arm}
if max(probs) < futility_threshold:
return {"stop": True, "reason": "futility", "best_arm": None}
return {"stop": False}
# Simulate
trial = BayesianAdaptiveTrial(n_arms=3)
trial.update(0, successes=15, failures=35) # Arm 0: 30% response
trial.update(1, successes=25, failures=25) # Arm 1: 50% response
trial.update(2, successes=10, failures=40) # Arm 2: 20% response
print(trial.allocation_probs()) # Most patients allocated to arm 1
print(trial.should_stop())
The I-SPY 2 breast cancer trial used this approach to evaluate 20+ experimental therapies simultaneously, graduating effective drugs to phase 3 in a fraction of the usual time.
Generating regulatory tables
Demographics table with tableone
from tableone import TableOne
columns = ["AGE", "SEX", "RACE", "WEIGHT", "HEIGHT", "ECOG_BL"]
categorical = ["SEX", "RACE", "ECOG_BL"]
groupby = "TRT01P"
table = TableOne(
adsl, columns=columns, categorical=categorical,
groupby=groupby, pval=True, htest_name=True
)
print(table.tabulate(tablefmt="grid"))
table.to_excel("demographics_table.xlsx")
Adverse events summary
def adverse_events_table(ae: pd.DataFrame, adsl: pd.DataFrame) -> pd.DataFrame:
"""Adverse events by system organ class and preferred term."""
ae_safety = ae.merge(
adsl[adsl["SAFFL"] == "Y"][["USUBJID", "TRT01P"]],
on="USUBJID"
)
n_per_arm = adsl[adsl["SAFFL"] == "Y"].groupby("TRT01P")["USUBJID"].nunique()
# Count subjects with each AE (not event count)
ae_counts = (
ae_safety.groupby(["TRT01P", "AEBODSYS", "AEDECOD"])["USUBJID"]
.nunique()
.reset_index(name="n_subjects")
)
# Calculate percentages
ae_counts["pct"] = ae_counts.apply(
lambda row: 100 * row["n_subjects"] / n_per_arm[row["TRT01P"]], axis=1
)
ae_counts["display"] = ae_counts.apply(
lambda row: f"{row['n_subjects']} ({row['pct']:.1f}%)", axis=1
)
return ae_counts.pivot_table(
index=["AEBODSYS", "AEDECOD"], columns="TRT01P",
values="display", aggfunc="first"
).fillna("0 (0.0%)")
Reproducibility requirements
Regulatory agencies demand that analyses be reproducible. Best practices:
project/
├── environment.yml # Conda environment with pinned versions
├── Makefile # Reproducible build: make analysis
├── src/
│ ├── data/
│ │ └── build_adam.py # SDTM → ADaM transformation
│ ├── analysis/
│ │ ├── primary.py # Primary endpoint analysis
│ │ ├── survival.py # Time-to-event analyses
│ │ └── safety.py # Safety analyses
│ └── tables/
│ ├── demographics.py
│ └── adverse_events.py
├── outputs/
│ ├── tables/
│ ├── figures/
│ └── listings/
└── validation/
├── double_programming/ # Independent replication of key results
└── test_adam.py # Unit tests for derived variables
Double programming — having two analysts independently code the same analysis and comparing results — is standard practice. Python’s pytest framework automates the comparison:
def test_primary_endpoint_matches():
result_analyst_1 = run_primary_analysis("analyst_1_code.py")
result_analyst_2 = run_primary_analysis("analyst_2_code.py")
assert abs(result_analyst_1["hazard_ratio"] - result_analyst_2["hazard_ratio"]) < 1e-6
assert abs(result_analyst_1["p_value"] - result_analyst_2["p_value"]) < 1e-6
Tradeoffs
| Approach | Strength | Weakness |
|---|---|---|
| Python (lifelines, scipy) | Open source, reproducible, flexible | Less regulatory track record than SAS |
| SAS | Gold standard for FDA submissions | Expensive, less flexible |
| R (survival, lme4) | Strong statistical libraries | Package management can be fragile |
| Bayesian adaptive (PyMC) | Faster, more ethical trials | Requires specialized expertise |
The one thing to remember: Python can handle every stage of clinical trial analysis — from CDISC data management through survival modeling to regulatory table generation — but production use requires rigorous version control, double programming, and environment pinning to meet the reproducibility standards that regulatory agencies demand.
See Also
- Python Biopython Bioinformatics How Python helps scientists read the instruction manual hidden inside every living thing's DNA.
- Python Drug Interaction Modeling How Python helps scientists figure out which medicines are safe to take together and which combinations could be dangerous.
- Python Genomics Sequencing How Python helps scientists read and understand the instruction manual written inside every cell of your body.
- Python Medical Image Analysis How Python helps doctors see inside your body more clearly by teaching computers to read X-rays, MRIs, and CT scans.
- Python Pandemic Modeling How Python helps scientists predict the spread of diseases like COVID-19 and plan the best ways to slow them down.