Python NFT Metadata Generation — Deep Dive

Build production NFT generation pipelines: trait conflict resolution, provenance hashing, on-chain reveals, and large-scale IPFS deployment with Python.

Pipeline architecture

A production NFT generation pipeline has distinct stages, each producing artifacts that feed the next:

Trait Config → Combination Generator → Image Compositor → Metadata Builder
    → Rarity Calculator → Provenance Hasher → IPFS Uploader → Contract Deployment

Each stage should be independently runnable and produce checksummed outputs. If the image compositor crashes at item 7,432, you don’t want to regenerate the first 7,431 images.

Advanced trait system design

Conditional traits and conflicts

Real collections have trait dependencies. A “Scuba Mask” can’t appear with a “Top Hat.” A “Wings” accessory requires a “Sky” or “Space” background to make visual sense.

TRAIT_RULES = {
    "conflicts": [
        (("Hat", "Top Hat"), ("Accessory", "Scuba Mask")),
        (("Hat", "Astronaut Helmet"), ("Eyes", "Sunglasses")),
    ],
    "requires": [
        (("Accessory", "Wings"), ("Background", ["Sky", "Space", "Clouds"])),
    ],
    "exclusive": [
        # Only one of these can appear per token
        ["Golden Crown", "Diamond Tiara", "Platinum Helm"],
    ],
}

def validate_combination(traits: dict) -> bool:
    for (cat_a, val_a), (cat_b, val_b) in TRAIT_RULES["conflicts"]:
        if traits.get(cat_a) == val_a and traits.get(cat_b) == val_b:
            return False

    for (cat_req, val_req), (cat_dep, valid_deps) in TRAIT_RULES["requires"]:
        if traits.get(cat_req) == val_req:
            if isinstance(valid_deps, list):
                if traits.get(cat_dep) not in valid_deps:
                    return False
            elif traits.get(cat_dep) != valid_deps:
                return False

    return True

With conflict rules, naive rejection sampling (generate-and-discard) becomes inefficient for tightly constrained collections. A better approach: generate valid combinations using constraint satisfaction and then sample from the valid set.

Rarity tiers with guarantees

Some collections guarantee exact counts: “exactly 10 Legendary items” or “at most 100 items with Gold Hat.” This requires switching from weighted random to allocation-based generation:

def allocate_traits(trait_config, total_count):
    allocations = {}
    for category, options in trait_config.items():
        remaining = total_count
        category_alloc = []
        for i, option in enumerate(options):
            if i == len(options) - 1:
                count = remaining
            else:
                count = round(total_count * option["target_pct"])
                count = max(option.get("min_count", 0), count)
                count = min(option.get("max_count", total_count), count)
            category_alloc.append({"value": option["value"], "count": count})
            remaining -= count
        allocations[category] = category_alloc
    return allocations

After allocation, shuffle the assigned traits across token IDs so that low-numbered tokens aren’t all legendary.

Image generation at scale

Parallel compositing

For 10,000+ items, sequential image generation takes hours. Parallelize with multiprocessing:

from multiprocessing import Pool
from functools import partial

def generate_single_image(token_id, traits_list, layers_dir, output_dir):
    traits = traits_list[token_id]
    img = compose_image(layers_dir, traits)
    img.save(output_dir / f"{token_id}.png", optimize=True)
    return token_id

def generate_all(traits_list, layers_dir, output_dir, workers=8):
    func = partial(
        generate_single_image,
        traits_list=traits_list,
        layers_dir=layers_dir,
        output_dir=output_dir,
    )
    with Pool(workers) as pool:
        results = pool.map(func, range(len(traits_list)))
    return results

On a modern 8-core machine, this generates ~10,000 2048x2048 images in about 20 minutes instead of 2+ hours.

Image optimization

Raw PNGs can be 2-5 MB each. For a 10,000-item collection, that’s 20-50 GB. Optimize:

Quantize to 256 colors where art style permits (reduces size by 60-80%).
Use pngquant via subprocess for lossy PNG compression.
Generate multiple sizes: Full resolution for IPFS, thumbnails for quick marketplace loading.

import subprocess

def optimize_png(input_path, output_path, quality="65-80"):
    subprocess.run(
        ["pngquant", "--quality", quality, "--output", str(output_path),
         "--force", str(input_path)],
        check=True,
    )

Provenance hash

The provenance hash proves that the collection wasn’t manipulated after reveal. It’s calculated by hashing all images in order and publishing the final hash before minting begins:

import hashlib

def calculate_provenance(image_dir, count):
    combined = ""
    for token_id in range(count):
        img_path = image_dir / f"{token_id}.png"
        img_hash = hashlib.sha256(img_path.read_bytes()).hexdigest()
        combined += img_hash

    provenance = hashlib.sha256(combined.encode()).hexdigest()
    return provenance

The provenance hash is stored on-chain or published publicly before the reveal. Anyone can independently verify it by downloading all images and recomputing.

Delayed reveal mechanism

Many projects hide metadata until after minting to prevent sniping (people only minting rare ones). The flow:

Pre-reveal: All tokens point to a placeholder metadata file with a generic image.
Reveal: The contract owner calls a function that shifts the base URI to the real metadata.
Offset: A random offset (generated from a future block hash) shuffles which token ID maps to which metadata, preventing the team from knowing assignments in advance.

def generate_offset_mapping(count, offset):
    """Maps token IDs to metadata IDs with circular offset."""
    return {token_id: (token_id + offset) % count for token_id in range(count)}

def write_revealed_metadata(original_metadata, offset, output_dir):
    mapping = generate_offset_mapping(len(original_metadata), offset)
    for token_id, metadata_id in mapping.items():
        meta = original_metadata[metadata_id].copy()
        meta["name"] = f"Collection #{token_id}"
        output_path = output_dir / f"{token_id}.json"
        output_path.write_text(json.dumps(meta, indent=2))

IPFS upload strategies

Pinning services

Pinata and NFT.Storage are popular pinning services with Python SDKs:

import requests

def upload_to_pinata(file_path, api_key, secret_key):
    url = "https://api.pinata.cloud/pinning/pinFileToIPFS"
    headers = {
        "pinata_api_key": api_key,
        "pinata_secret_api_key": secret_key,
    }
    with open(file_path, "rb") as f:
        response = requests.post(url, files={"file": f}, headers=headers)
    return response.json()["IpfsHash"]

Directory uploads

For collections, upload the entire directory as a single IPFS object. This gives you a directory CID where CID/0.json, CID/1.json, etc., resolve correctly:

def upload_directory_to_pinata(dir_path, api_key, secret_key):
    url = "https://api.pinata.cloud/pinning/pinFileToIPFS"
    headers = {
        "pinata_api_key": api_key,
        "pinata_secret_api_key": secret_key,
    }
    files = []
    for file_path in sorted(dir_path.iterdir()):
        files.append(("file", (f"collection/{file_path.name}", open(file_path, "rb"))))

    response = requests.post(url, files=files, headers=headers)
    return response.json()["IpfsHash"]

For 10,000+ files, batch uploads and retry logic are essential. IPFS pinning can be slow and rate-limited.

Metadata validation suite

Before uploading, run comprehensive validation:

def validate_collection(metadata_dir, image_dir, count):
    errors = []
    trait_counts = {}

    for token_id in range(count):
        meta_path = metadata_dir / f"{token_id}.json"
        img_path = image_dir / f"{token_id}.png"

        if not meta_path.exists():
            errors.append(f"Missing metadata: {token_id}")
            continue
        if not img_path.exists():
            errors.append(f"Missing image: {token_id}")

        meta = json.loads(meta_path.read_text())

        # Schema validation
        required = ["name", "description", "image", "attributes"]
        for field in required:
            if field not in meta:
                errors.append(f"Token {token_id}: missing field '{field}'")

        # Track trait distributions
        for attr in meta.get("attributes", []):
            key = (attr["trait_type"], attr["value"])
            trait_counts[key] = trait_counts.get(key, 0) + 1

    # Check uniqueness
    all_combos = []
    for token_id in range(count):
        meta = json.loads((metadata_dir / f"{token_id}.json").read_text())
        combo = tuple(sorted((a["trait_type"], a["value"]) for a in meta["attributes"]))
        all_combos.append(combo)

    duplicates = [c for c, n in Counter(all_combos).items() if n > 1]
    if duplicates:
        errors.append(f"Found {len(duplicates)} duplicate trait combinations")

    return errors, trait_counts

One thing to remember

Production NFT metadata generation is a multi-stage pipeline where each step — trait allocation, conflict resolution, image compositing, provenance hashing, and IPFS upload — must be independently verifiable and reproducible, because once the collection is live, there are no do-overs.

pythonblockchainproduction