Python PyOxidizer Distribution — Deep Dive

Python distribution mechanics

PyOxidizer does not use your system Python. It downloads (or builds) a special standalone Python distribution optimized for embedding. These distributions come from the python-build-standalone project (also by Gregory Szorc) and differ from regular Python in key ways:

  • Statically linked — libpython and many dependencies (OpenSSL, zlib, etc.) are compiled as static libraries.
  • Relocatable — no hardcoded paths, works from any filesystem location.
  • Stripped of unnecessary components — no tkinter, no idle, reduced standard library.
# pyoxidizer.bzl — selecting a distribution
def make_exe():
    # Default: latest CPython for current platform
    dist = default_python_distribution()
    
    # Or specify exactly
    dist = PythonDistribution(
        url="https://github.com/indygreg/python-build-standalone/releases/...",
        sha256="abc123...",
    )

The distribution is cached locally after first download. Subsequent builds reuse it.

Oxidized importer deep dive

The oxidized importer (oxidized_importer module) is a Rust-based Python meta path finder that replaces the standard PathFinder. It stores module data in a custom packed format:

Data storage format

[Module Index]
  module_name → (bytecode_offset, bytecode_length, source_offset, source_length)
  
[Bytecode Section]
  Concatenated .pyc data for all modules

[Source Section]  (optional)
  Concatenated .py source for all modules

All sections are stored as raw bytes in the Rust binary’s .data or .rodata segment. At startup, the importer builds an in-memory hash map from the index.

Import resolution

# When Python encounters: import myapp.utils
# The oxidized importer:
# 1. Looks up "myapp.utils" in the hash map (O(1))
# 2. Gets (bytecode_offset, bytecode_length)
# 3. Creates a memoryview into the binary's data section
# 4. Calls marshal.loads() on the bytecode
# 5. Returns the code object — no disk I/O at any point

For extension modules (.so/.pyd), the importer has three strategies:

  1. In-memory loading — on platforms that support it (Linux with memfd_create), load the shared library from memory without writing to disk.
  2. Temporary file — extract to a temp directory and load with dlopen.
  3. Filesystem relative — store alongside the binary and load normally.
# Configure extension module handling
policy = dist.make_python_packaging_policy()
policy.extension_module_filter = "minimal"  # Only essential extensions
policy.resources_location = "in-memory"     # Prefer in-memory loading
policy.resources_location_fallback = "filesystem-relative"  # Fallback

Starlark configuration in depth

The pyoxidizer.bzl configuration file uses Starlark, a Python-like language. Key patterns:

Dependency management

def make_exe():
    dist = default_python_distribution()
    policy = dist.make_python_packaging_policy()
    
    python_config = dist.make_python_interpreter_config()
    python_config.run_module = "myapp"
    python_config.allocator_backend = "rust"  # Use Rust allocator
    python_config.oxidized_importer = True    # Enable oxidized importer
    
    exe = dist.to_python_executable(
        name="myapp",
        packaging_policy=policy,
        config=python_config,
    )
    
    # Install from PyPI
    exe.add_python_resources(exe.pip_install(["flask==3.0.0", "gunicorn"]))
    
    # Install from local path
    exe.add_python_resources(exe.pip_install(["-e", "."]))
    
    # Install from requirements file
    exe.add_python_resources(exe.pip_install(["-r", "requirements.txt"]))
    
    # Add individual files
    exe.add_python_resources(exe.read_package_root(
        path="src",
        packages=["myapp"],
    ))
    
    return exe

Resource filtering

def make_exe():
    dist = default_python_distribution()
    policy = dist.make_python_packaging_policy()
    
    # Exclude test modules to reduce size
    policy.exclude_test_packages()
    
    # Set where resources go
    policy.resources_location = "in-memory"
    policy.resources_location_fallback = "filesystem-relative:lib"
    
    # Custom filter function
    def resource_filter(policy, resource):
        # Exclude specific packages
        if resource.name.startswith("tests."):
            resource.add_include = False
        if resource.name == "matplotlib":
            resource.add_location = "filesystem-relative:lib"
        return True
    
    policy.register_resource_callback(resource_filter)

Multi-target builds

def make_exe_linux():
    dist = default_python_distribution()
    exe = dist.to_python_executable(name="myapp-linux", ...)
    return exe

def make_exe_macos():
    dist = PythonDistribution(url="...", sha256="...")
    exe = dist.to_python_executable(name="myapp-macos", ...)
    return exe

register_target("linux", make_exe_linux)
register_target("macos", make_exe_macos)
resolve_targets()

Rust integration patterns

PyOxidizer’s Rust underpinnings enable deeper integration:

Rust + Python hybrid application

// src/main.rs
use pyembed::MainPythonInterpreter;

fn main() {
    // Do Rust-native work
    let config = load_config();
    
    // Initialize embedded Python
    let interp = MainPythonInterpreter::new(
        pyembed::default_python_config()
    ).unwrap();
    
    interp.with_gil(|py| {
        // Call Python code from Rust
        let myapp = py.import("myapp").unwrap();
        let result = myapp.call_method1("process", (config.input_path,)).unwrap();
        
        println!("Python returned: {}", result);
    });
}

This pattern lets you write performance-critical startup code in Rust while keeping business logic in Python.

Custom memory allocator

# pyoxidizer.bzl
python_config.allocator_backend = "rust"      # jemalloc-like Rust allocator
python_config.allocator_raw = True            # Bypass Python's pymalloc
python_config.allocator_debug = False         # No debug overhead

Using Rust’s allocator instead of Python’s default pymalloc can improve memory usage patterns, especially for long-running applications.

Handling problematic packages

Some Python packages resist embedding:

Packages that inspect __file__

# Many packages do this:
data_dir = os.path.dirname(__file__)  # Fails when loaded from memory

# Fix: configure filesystem-relative placement for that package
policy.resources_location_fallback = "filesystem-relative:lib"

Packages with native extensions

# Some extensions can't be loaded from memory
# Place them on the filesystem
def resource_filter(policy, resource):
    if resource.name in ("numpy", "pandas", "scipy"):
        resource.add_location = "filesystem-relative:lib"
    return True

Packages using pkg_resources or importlib.metadata

# These need special handling — ensure metadata files are included
exe.add_python_resources(exe.pip_install(
    ["--no-binary", ":none:", "my-package"],
))

Build optimization

Reducing binary size

# Strip debug symbols (pyoxidizer.bzl)
exe = dist.to_python_executable(
    name="myapp",
    packaging_policy=policy,
    config=python_config,
)

# Post-build stripping
# Linux
# strip build/x86_64-unknown-linux-musl/release/install/myapp

# Exclude unused standard library modules
policy.include_classified_resources = False
policy.include_distribution_sources = False
policy.include_distribution_resources = False

Typical binary sizes:

ContentApproximate size
Python interpreter only15-20 MB
+ standard library25-35 MB
+ small app (Flask)35-50 MB
+ heavy deps (numpy, pandas)80-150 MB

Build caching

PyOxidizer caches the Rust compilation and Python distribution. First builds take 2-5 minutes; incremental builds (code changes only) take 30-60 seconds.

# CI caching strategy
# Cache these directories:
# - ~/.cache/pyoxidizer/  (Python distributions)
# - target/               (Rust compilation cache)

Production CI/CD pipeline

# GitHub Actions
name: Build PyOxidizer
on: [push, pull_request]

jobs:
  build:
    strategy:
      matrix:
        include:
          - os: ubuntu-latest
            target: x86_64-unknown-linux-musl
          - os: macos-latest
            target: x86_64-apple-darwin
          - os: windows-latest
            target: x86_64-pc-windows-msvc
    
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v4
      
      - name: Install PyOxidizer
        run: pip install pyoxidizer
      
      - name: Build
        run: pyoxidizer build --release
      
      - name: Test binary
        run: |
          ./build/*/release/install/myapp --version
          ./build/*/release/install/myapp --self-test
      
      - name: Upload
        uses: actions/upload-artifact@v4
        with:
          name: myapp-${{ matrix.target }}
          path: build/*/release/install/

Alternatives landscape

PyOxidizer pioneered the “embed Python in a native binary” approach. The ecosystem has evolved:

  • PyApp — a simpler Rust-based approach inspired by PyOxidizer, with easier configuration.
  • Nuitka — compiles Python to C, producing similar single-file binaries.
  • cx_Freeze — mature packaging tool, less innovative but very reliable.
  • Briefcase — BeeWare’s packaging tool, focused on mobile and desktop apps.

If PyOxidizer’s complexity is a barrier, PyApp is worth evaluating as a lighter alternative with the same Rust-embedding philosophy.

One thing to remember: PyOxidizer’s architecture — embedding Python in a Rust binary with an in-memory importer — represents the most technically ambitious approach to Python distribution, trading build complexity for the fastest possible startup and the cleanest possible deployment artifact.

pythonpyoxidizerdistributionpackagingembedding

See Also

  • Python Appimage Distribution An AppImage is like a portable app on a USB stick — download one file, double-click it, and your Python program runs on any Linux computer without installing anything.
  • Python Briefcase Native Apps Imagine a travel agent who repacks your suitcase for each country's customs — Briefcase converts your Python app into proper native packages for every platform.
  • Python Flatpak Packaging Flatpak wraps your Python app in a safe bubble that works on every Linux system — like a snow globe that keeps your program perfect inside.
  • Python Mypyc Compilation Your type hints are not just for documentation — mypyc turns them into speed boosts by compiling typed Python into fast C extensions.
  • Python Nuitka Compilation What if your Python code could run as fast as a race car instead of a bicycle? Nuitka translates Python into C to make that happen.