Python for Medical Image Analysis — Deep Dive

Architecture of a medical imaging pipeline

A production medical imaging system moves data through four stages: ingestion, preprocessing, inference, and reporting. Each stage has Python-specific tooling and engineering considerations that differ significantly from general computer vision work.

PACS/Scanner → DICOM Ingestion → Preprocessing → Model Inference → Structured Report
                (pydicom)        (SimpleITK)      (MONAI/PyTorch)   (highdicom)

DICOM ingestion with pydicom

Reading and organizing series

A single CT study may contain hundreds of DICOM files — one per slice. Organizing them into coherent 3D volumes requires sorting by SeriesInstanceUID and InstanceNumber:

import pydicom
from pathlib import Path
from collections import defaultdict

def load_series(dicom_dir: Path) -> dict[str, list[pydicom.Dataset]]:
    series = defaultdict(list)
    for f in dicom_dir.glob("*.dcm"):
        ds = pydicom.dcmread(f, stop_before_pixels=True)
        series[ds.SeriesInstanceUID].append(f)
    
    volumes = {}
    for uid, files in series.items():
        slices = [pydicom.dcmread(f) for f in files]
        slices.sort(key=lambda s: float(s.ImagePositionPatient[2]))
        volumes[uid] = slices
    return volumes

The stop_before_pixels=True trick reads only metadata during the sorting pass, cutting I/O time by 80%+ on large studies.

Hounsfield Unit conversion

CT pixel values stored in DICOM are raw detector values. Converting to Hounsfield Units requires the RescaleSlope and RescaleIntercept tags:

import numpy as np

def to_hounsfield(ds: pydicom.Dataset) -> np.ndarray:
    pixels = ds.pixel_array.astype(np.float32)
    return pixels * ds.RescaleSlope + ds.RescaleIntercept

def apply_window(hu_array: np.ndarray, center: float, width: float) -> np.ndarray:
    lower = center - width / 2
    upper = center + width / 2
    return np.clip((hu_array - lower) / (upper - lower), 0, 1)

Standard windows: lung (center −600, width 1500), soft tissue (center 40, width 400), bone (center 400, width 1800).

3D preprocessing with SimpleITK

SimpleITK wraps the Insight Toolkit (ITK) and handles operations that are awkward with raw NumPy — especially resampling and registration in physical coordinate space.

Resampling to isotropic voxels

import SimpleITK as sitk

def resample_isotropic(image: sitk.Image, new_spacing=(1.0, 1.0, 1.0)):
    original_spacing = image.GetSpacing()
    original_size = image.GetSize()
    
    new_size = [
        int(round(osz * ospc / nspc))
        for osz, ospc, nspc in zip(original_size, original_spacing, new_spacing)
    ]
    
    resampler = sitk.ResampleImageFilter()
    resampler.SetOutputSpacing(new_spacing)
    resampler.SetSize(new_size)
    resampler.SetOutputDirection(image.GetDirection())
    resampler.SetOutputOrigin(image.GetOrigin())
    resampler.SetInterpolator(sitk.sitkLinear)
    return resampler.Execute(image)

For label maps (segmentation masks), use sitk.sitkNearestNeighbor interpolation to avoid creating fractional labels.

Registration

When comparing pre- and post-treatment scans, registration aligns them spatially:

def register_images(fixed: sitk.Image, moving: sitk.Image) -> sitk.Image:
    registration = sitk.ImageRegistrationMethod()
    registration.SetMetricAsMattesMutualInformation(numberOfHistogramBins=50)
    registration.SetOptimizerAsGradientDescent(
        learningRate=1.0, numberOfIterations=200
    )
    registration.SetInitialTransform(sitk.CenteredTransformInitializer(
        fixed, moving, sitk.AffineTransform(3)
    ))
    registration.SetInterpolator(sitk.sitkLinear)
    
    final_transform = registration.Execute(fixed, moving)
    return sitk.Resample(moving, fixed, final_transform)

Building segmentation models with MONAI

MONAI extends PyTorch with medical-imaging-specific components. Here is a complete training pipeline for 3D organ segmentation.

Data loading with medical-aware transforms

from monai.transforms import (
    Compose, LoadImaged, EnsureChannelFirstd, Spacingd,
    ScaleIntensityRanged, CropForegroundd, RandCropByPosNegLabeld,
    RandFlipd, RandRotate90d, EnsureTyped
)

train_transforms = Compose([
    LoadImaged(keys=["image", "label"]),
    EnsureChannelFirstd(keys=["image", "label"]),
    Spacingd(keys=["image", "label"], pixdim=(1.5, 1.5, 2.0),
             mode=("bilinear", "nearest")),
    ScaleIntensityRanged(keys=["image"], a_min=-175, a_max=250,
                         b_min=0.0, b_max=1.0, clip=True),
    CropForegroundd(keys=["image", "label"], source_key="image"),
    RandCropByPosNegLabeld(
        keys=["image", "label"], label_key="label",
        spatial_size=(96, 96, 96), pos=1, neg=1, num_samples=4
    ),
    RandFlipd(keys=["image", "label"], prob=0.2, spatial_axis=0),
    RandRotate90d(keys=["image", "label"], prob=0.2, max_k=3),
    EnsureTyped(keys=["image", "label"]),
])

The d suffix on transform names means “dictionary-based” — they operate on dicts with named keys, keeping images and labels synchronized through the pipeline.

Model architecture — Swin UNETR

While vanilla U-Net works, modern medical segmentation uses transformer-based architectures:

from monai.networks.nets import SwinUNETR

model = SwinUNETR(
    img_size=(96, 96, 96),
    in_channels=1,
    out_channels=14,  # 13 organs + background
    feature_size=48,
    use_checkpoint=True,  # gradient checkpointing for memory
)

Swin UNETR combines Swin Transformer’s shifted window attention (for global context) with U-Net’s decoder (for precise localization). It won first place in the BTCV multi-organ segmentation challenge.

Loss function — Dice + Cross-Entropy

from monai.losses import DiceCELoss

loss_fn = DiceCELoss(to_onehot_y=True, softmax=True)

Dice loss directly optimizes the overlap metric used for evaluation, while cross-entropy provides stable gradients early in training. Their combination is the standard choice for medical segmentation.

Training loop with sliding window inference

Medical volumes are too large for single forward passes. MONAI’s sliding window inference processes overlapping patches and blends predictions:

from monai.inferers import sliding_window_inference

def validation_step(model, val_loader, device):
    model.eval()
    dice_scores = []
    with torch.no_grad():
        for batch in val_loader:
            images = batch["image"].to(device)
            labels = batch["label"].to(device)
            
            outputs = sliding_window_inference(
                images, roi_size=(96, 96, 96),
                sw_batch_size=4, predictor=model,
                overlap=0.5  # 50% overlap for smooth boundaries
            )
            dice = compute_dice(outputs.argmax(dim=1), labels)
            dice_scores.append(dice)
    return torch.stack(dice_scores).mean()

Data augmentation for medical images

Medical datasets are small (hundreds to low thousands of cases). Augmentation is critical, but must be physically plausible:

AugmentationWhy it worksCaveat
Random elastic deformationMimics anatomical variationKeep deformation magnitude small
Intensity shifting/scalingSimulates scanner differencesStay within modality’s physical range
Random croppingForces model to learn from partial viewsEnsure positive samples in crop
Affine rotationPatient positioning variesLimit to ±15° for most anatomies
Gaussian noiseSimulates acquisition noiseMatch the noise profile of the modality

TorchIO provides 3D-native augmentation with physics-informed defaults:

import torchio as tio

augment = tio.Compose([
    tio.RandomElasticDeformation(max_displacement=7),
    tio.RandomAffine(degrees=10, scales=(0.9, 1.1)),
    tio.RandomNoise(std=(0, 0.1)),
    tio.RandomGamma(log_gamma=(-0.3, 0.3)),
])

Regulatory and deployment considerations

FDA clearance pathways

Medical AI software in the US requires FDA clearance. The two main pathways:

  • 510(k) — demonstrate substantial equivalence to an existing cleared device. Most CADe/CADt systems use this path.
  • De Novo — for novel devices without a predicate. More expensive but necessary for first-in-class algorithms.

Python code itself is not regulated, but the entire software lifecycle is. You need version-controlled training data, reproducible model builds, documented validation on a held-out clinical dataset, and continuous performance monitoring post-deployment.

Model serving architecture

PACS → DICOM listener (pynetdicom) → Preprocessing worker → 
  Model inference (Triton/TorchServe) → Structured Report → PACS

pynetdicom implements the DICOM networking protocol, allowing your Python service to receive images directly from hospital scanners via C-STORE operations. Results are sent back as DICOM Structured Reports using highdicom.

Performance benchmarks

On the Medical Segmentation Decathlon (10 tasks, diverse organs and pathologies):

  • nnU-Net achieves mean Dice scores of 0.80+ across all tasks with zero manual configuration
  • Swin UNETR reaches 0.82+ on the BTCV benchmark with self-supervised pretraining
  • Inference time for a full CT volume (512 × 512 × 300): ~15 seconds on an A100 GPU with sliding window

Bias and fairness

Medical imaging models trained predominantly on one demographic may underperform on others. A dermatology classifier trained mostly on light skin tones will miss conditions that present differently on darker skin. Responsible development requires:

  • Stratified evaluation across demographics
  • Reporting performance per subgroup, not just aggregate metrics
  • Active collection of diverse training data

Tradeoffs

ApproachStrengthWeakness
nnU-Net (self-configuring)Zero tuning needed, strong baselinesSlow training (searches architecture space)
MONAI + Swin UNETRState-of-the-art accuracyRequires GPU memory, manual config
Classical (SimpleITK)No training data needed, interpretableFails on subtle or variable anatomy
Transfer from natural imagesFast to train3D medical volumes differ from 2D photos

The one thing to remember: A production medical imaging pipeline in Python chains pydicom for ingestion, SimpleITK for spatial preprocessing, MONAI for deep learning, and pynetdicom for hospital integration — but the hardest part is not the code, it is the clinical validation and regulatory compliance that make it safe to use on real patients.

pythonmedical-imaginghealthcare

See Also