Python NFT Metadata Generation — Deep Dive
Pipeline architecture
A production NFT generation pipeline has distinct stages, each producing artifacts that feed the next:
Trait Config → Combination Generator → Image Compositor → Metadata Builder
→ Rarity Calculator → Provenance Hasher → IPFS Uploader → Contract Deployment
Each stage should be independently runnable and produce checksummed outputs. If the image compositor crashes at item 7,432, you don’t want to regenerate the first 7,431 images.
Advanced trait system design
Conditional traits and conflicts
Real collections have trait dependencies. A “Scuba Mask” can’t appear with a “Top Hat.” A “Wings” accessory requires a “Sky” or “Space” background to make visual sense.
TRAIT_RULES = {
"conflicts": [
(("Hat", "Top Hat"), ("Accessory", "Scuba Mask")),
(("Hat", "Astronaut Helmet"), ("Eyes", "Sunglasses")),
],
"requires": [
(("Accessory", "Wings"), ("Background", ["Sky", "Space", "Clouds"])),
],
"exclusive": [
# Only one of these can appear per token
["Golden Crown", "Diamond Tiara", "Platinum Helm"],
],
}
def validate_combination(traits: dict) -> bool:
for (cat_a, val_a), (cat_b, val_b) in TRAIT_RULES["conflicts"]:
if traits.get(cat_a) == val_a and traits.get(cat_b) == val_b:
return False
for (cat_req, val_req), (cat_dep, valid_deps) in TRAIT_RULES["requires"]:
if traits.get(cat_req) == val_req:
if isinstance(valid_deps, list):
if traits.get(cat_dep) not in valid_deps:
return False
elif traits.get(cat_dep) != valid_deps:
return False
return True
With conflict rules, naive rejection sampling (generate-and-discard) becomes inefficient for tightly constrained collections. A better approach: generate valid combinations using constraint satisfaction and then sample from the valid set.
Rarity tiers with guarantees
Some collections guarantee exact counts: “exactly 10 Legendary items” or “at most 100 items with Gold Hat.” This requires switching from weighted random to allocation-based generation:
def allocate_traits(trait_config, total_count):
allocations = {}
for category, options in trait_config.items():
remaining = total_count
category_alloc = []
for i, option in enumerate(options):
if i == len(options) - 1:
count = remaining
else:
count = round(total_count * option["target_pct"])
count = max(option.get("min_count", 0), count)
count = min(option.get("max_count", total_count), count)
category_alloc.append({"value": option["value"], "count": count})
remaining -= count
allocations[category] = category_alloc
return allocations
After allocation, shuffle the assigned traits across token IDs so that low-numbered tokens aren’t all legendary.
Image generation at scale
Parallel compositing
For 10,000+ items, sequential image generation takes hours. Parallelize with multiprocessing:
from multiprocessing import Pool
from functools import partial
def generate_single_image(token_id, traits_list, layers_dir, output_dir):
traits = traits_list[token_id]
img = compose_image(layers_dir, traits)
img.save(output_dir / f"{token_id}.png", optimize=True)
return token_id
def generate_all(traits_list, layers_dir, output_dir, workers=8):
func = partial(
generate_single_image,
traits_list=traits_list,
layers_dir=layers_dir,
output_dir=output_dir,
)
with Pool(workers) as pool:
results = pool.map(func, range(len(traits_list)))
return results
On a modern 8-core machine, this generates ~10,000 2048x2048 images in about 20 minutes instead of 2+ hours.
Image optimization
Raw PNGs can be 2-5 MB each. For a 10,000-item collection, that’s 20-50 GB. Optimize:
- Quantize to 256 colors where art style permits (reduces size by 60-80%).
- Use pngquant via subprocess for lossy PNG compression.
- Generate multiple sizes: Full resolution for IPFS, thumbnails for quick marketplace loading.
import subprocess
def optimize_png(input_path, output_path, quality="65-80"):
subprocess.run(
["pngquant", "--quality", quality, "--output", str(output_path),
"--force", str(input_path)],
check=True,
)
Provenance hash
The provenance hash proves that the collection wasn’t manipulated after reveal. It’s calculated by hashing all images in order and publishing the final hash before minting begins:
import hashlib
def calculate_provenance(image_dir, count):
combined = ""
for token_id in range(count):
img_path = image_dir / f"{token_id}.png"
img_hash = hashlib.sha256(img_path.read_bytes()).hexdigest()
combined += img_hash
provenance = hashlib.sha256(combined.encode()).hexdigest()
return provenance
The provenance hash is stored on-chain or published publicly before the reveal. Anyone can independently verify it by downloading all images and recomputing.
Delayed reveal mechanism
Many projects hide metadata until after minting to prevent sniping (people only minting rare ones). The flow:
- Pre-reveal: All tokens point to a placeholder metadata file with a generic image.
- Reveal: The contract owner calls a function that shifts the base URI to the real metadata.
- Offset: A random offset (generated from a future block hash) shuffles which token ID maps to which metadata, preventing the team from knowing assignments in advance.
def generate_offset_mapping(count, offset):
"""Maps token IDs to metadata IDs with circular offset."""
return {token_id: (token_id + offset) % count for token_id in range(count)}
def write_revealed_metadata(original_metadata, offset, output_dir):
mapping = generate_offset_mapping(len(original_metadata), offset)
for token_id, metadata_id in mapping.items():
meta = original_metadata[metadata_id].copy()
meta["name"] = f"Collection #{token_id}"
output_path = output_dir / f"{token_id}.json"
output_path.write_text(json.dumps(meta, indent=2))
IPFS upload strategies
Pinning services
Pinata and NFT.Storage are popular pinning services with Python SDKs:
import requests
def upload_to_pinata(file_path, api_key, secret_key):
url = "https://api.pinata.cloud/pinning/pinFileToIPFS"
headers = {
"pinata_api_key": api_key,
"pinata_secret_api_key": secret_key,
}
with open(file_path, "rb") as f:
response = requests.post(url, files={"file": f}, headers=headers)
return response.json()["IpfsHash"]
Directory uploads
For collections, upload the entire directory as a single IPFS object. This gives you a directory CID where CID/0.json, CID/1.json, etc., resolve correctly:
def upload_directory_to_pinata(dir_path, api_key, secret_key):
url = "https://api.pinata.cloud/pinning/pinFileToIPFS"
headers = {
"pinata_api_key": api_key,
"pinata_secret_api_key": secret_key,
}
files = []
for file_path in sorted(dir_path.iterdir()):
files.append(("file", (f"collection/{file_path.name}", open(file_path, "rb"))))
response = requests.post(url, files=files, headers=headers)
return response.json()["IpfsHash"]
For 10,000+ files, batch uploads and retry logic are essential. IPFS pinning can be slow and rate-limited.
Metadata validation suite
Before uploading, run comprehensive validation:
def validate_collection(metadata_dir, image_dir, count):
errors = []
trait_counts = {}
for token_id in range(count):
meta_path = metadata_dir / f"{token_id}.json"
img_path = image_dir / f"{token_id}.png"
if not meta_path.exists():
errors.append(f"Missing metadata: {token_id}")
continue
if not img_path.exists():
errors.append(f"Missing image: {token_id}")
meta = json.loads(meta_path.read_text())
# Schema validation
required = ["name", "description", "image", "attributes"]
for field in required:
if field not in meta:
errors.append(f"Token {token_id}: missing field '{field}'")
# Track trait distributions
for attr in meta.get("attributes", []):
key = (attr["trait_type"], attr["value"])
trait_counts[key] = trait_counts.get(key, 0) + 1
# Check uniqueness
all_combos = []
for token_id in range(count):
meta = json.loads((metadata_dir / f"{token_id}.json").read_text())
combo = tuple(sorted((a["trait_type"], a["value"]) for a in meta["attributes"]))
all_combos.append(combo)
duplicates = [c for c, n in Counter(all_combos).items() if n > 1]
if duplicates:
errors.append(f"Found {len(duplicates)} duplicate trait combinations")
return errors, trait_counts
One thing to remember
Production NFT metadata generation is a multi-stage pipeline where each step — trait allocation, conflict resolution, image compositing, provenance hashing, and IPFS upload — must be independently verifiable and reproducible, because once the collection is live, there are no do-overs.
See Also
- Python Blockchain Data Analysis How Python detectives read the blockchain's public ledger to find patterns, explained with a library guest book analogy.
- Python Crypto Trading Bots How Python programs trade cryptocurrency automatically while you sleep, explained with a lemonade stand price watcher.
- Python Defi Protocol Integration How Python connects to decentralized finance protocols, explained through a self-service banking analogy.
- Python Ipfs Integration How Python stores and retrieves files on the decentralized web using IPFS, explained through a neighborhood library network.
- Python Smart Contract Testing Why testing blockchain programs with Python matters, explained through a vending machine story anyone can follow.