PyTorch ONNX Export — Core Concepts
What ONNX Actually Is
ONNX (Open Neural Network Exchange) is an open specification for representing machine learning models. It defines a set of standard operators (convolution, matrix multiplication, activation functions, etc.) and a graph format that connects them. Any tool that reads ONNX can understand the model, regardless of which framework created it.
The specification is maintained by a community including Microsoft, Meta, AWS, and NVIDIA. It supports over 180 operators covering most operations needed by modern neural networks.
Why Export From PyTorch
PyTorch is optimized for research and development — dynamic computation graphs, Python-level debugging, eager execution. But production environments need different things:
- Speed: ONNX Runtime applies graph optimizations (operator fusion, constant folding) that PyTorch’s eager mode doesn’t
- Portability: Not every deployment target supports PyTorch. Edge devices, mobile apps, and web browsers often need ONNX or a format derived from it
- Independence: Decoupling the model from the training framework means your inference pipeline doesn’t depend on PyTorch versions or Python at all
How Export Works
PyTorch’s ONNX exporter traces your model by running it with sample input and recording every operation. The result is a static computation graph — a fixed sequence of operations without Python control flow.
The process:
- You provide sample input tensors (same shape and type as real data)
- PyTorch executes the model with those inputs
- Every tensor operation is recorded as an ONNX node
- The complete graph is saved to a
.onnxfile
This means the exporter captures one specific execution path. If your model has if statements that depend on input values, only the branch taken during tracing gets exported.
Two Export Modes
Trace-Based (torch.onnx.export with tracing)
The default mode. Runs the model once and records operations. Fast, simple, works for most models.
Limitation: Cannot capture data-dependent control flow (loops with variable iteration, conditional branches based on tensor values).
TorchDynamo-Based (torch.onnx.dynamo_export)
The newer approach using PyTorch 2.0’s compiler infrastructure. Analyzes the Python bytecode to capture control flow and dynamic shapes more faithfully.
Advantage: Handles dynamic shapes and some control flow that tracing misses.
| Feature | Trace-Based | Dynamo-Based |
|---|---|---|
| Dynamic shapes | Limited (fixed at export time) | Supported natively |
| Control flow | Only traced path | Partial support |
| Maturity | Stable, widely tested | Newer, evolving |
| Speed of export | Fast | Slower (compiler overhead) |
What Gets Exported (and What Doesn’t)
Exported:
- Model weights (all learned parameters)
- Computation graph (every tensor operation)
- Input/output shapes and types
Not exported:
- Python code (the
.pyfiles are not needed at inference) - Training-specific operations (dropout in eval mode, gradient tracking)
- Custom Python logic that doesn’t operate on tensors
This is why you must call model.eval() before export — it disables dropout, fixes batch normalization to use running statistics, and removes other training-only behavior.
ONNX Runtime: The Main Consumer
Most ONNX models are run with ONNX Runtime (ORT), which applies automatic optimizations:
- Operator fusion: Combines consecutive operations (Conv + BatchNorm + ReLU) into a single kernel
- Constant folding: Pre-computes operations that depend only on weights
- Memory planning: Reuses memory buffers across operations that don’t overlap in lifetime
These optimizations typically deliver 1.5-3× speedup over running the same model in PyTorch’s eager mode, with identical outputs.
Common Misconception
People assume ONNX export always produces a faster model. The export itself doesn’t speed anything up — the .onnx file is just a description. The speedup comes from the runtime (ONNX Runtime, TensorRT) that executes it with optimizations PyTorch’s eager mode doesn’t apply. If you run the ONNX model with a naive interpreter, it won’t be faster.
When ONNX Export Isn’t the Right Choice
- Rapid prototyping: If the model changes daily, re-exporting constantly adds friction
- Complex dynamic behavior: Models with heavy Python logic, variable-length generation loops, or custom CUDA kernels may not export cleanly
- PyTorch-specific features: Some operations (certain custom autograd functions, complex number support) have no ONNX equivalent
The one thing to remember: ONNX export freezes your PyTorch model into a portable, optimizable graph — the model works the same, but it can now run on any platform that speaks ONNX, often faster than in PyTorch itself.
See Also
- Python Pytorch Torchscript How TorchScript lets PyTorch models escape Python and run independently in apps, servers, and devices.
- Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.
- Ai Agents Architecture How AI systems go from answering questions to actually doing things — the design patterns that turn language models into autonomous agents that browse, code, and plan.
- Ai Agents ChatGPT answers questions. AI agents actually do things — browse the web, write code, send emails, and keep going until the job is done. Here's the difference.
- Ai Ethics Why building AI fairly is harder than it sounds — bias, accountability, privacy, and who gets to decide what AI is allowed to do.