Python Dependency Vulnerability Scanning — Deep Dive
The Full Scanning Pipeline
A mature dependency security pipeline has four layers: local development scanning, CI/CD gate scanning, scheduled monitoring, and runtime composition analysis.
Layer 1: Developer Workstation
# Install scanning tools
pip install pip-audit safety pipdeptree
# Quick scan of current environment
pip-audit
# Detailed output with descriptions
pip-audit --desc --format json | python -m json.tool
# Scan a requirements file without installing
pip-audit -r requirements.txt
# Visualize dependency tree to understand transitive deps
pipdeptree --warn silence | head -50
Pre-Commit Hook Integration
# .pre-commit-config.yaml
repos:
- repo: https://github.com/pypa/pip-audit
rev: v2.7.0
hooks:
- id: pip-audit
args: ["--strict", "--desc"]
This blocks commits that introduce known-vulnerable dependencies. Developers catch issues before they enter version control.
Layer 2: CI/CD Pipeline
# .github/workflows/security.yml
name: Dependency Security
on:
push:
branches: [main]
pull_request:
schedule:
- cron: "0 8 * * *" # Daily at 8 AM UTC
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install dependencies
run: pip install -r requirements.txt
- name: pip-audit (OSV database)
run: pip-audit --strict --desc --format json > audit.json
continue-on-error: true
- name: Upload audit results
uses: actions/upload-artifact@v4
with:
name: vulnerability-report
path: audit.json
- name: Fail on critical/high
run: |
python -c "
import json, sys
data = json.load(open('audit.json'))
critical = [v for v in data.get('dependencies', [])
if v.get('vulns')]
if critical:
print(f'Found {len(critical)} vulnerable packages')
for dep in critical:
print(f' {dep[\"name\"]}=={dep[\"version\"]}')
for vuln in dep['vulns']:
print(f' {vuln[\"id\"]}: {vuln.get(\"description\", \"\")}')
sys.exit(1)
print('No vulnerabilities found')
"
Layer 3: GitHub Dependabot
# .github/dependabot.yml
version: 2
updates:
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "daily"
open-pull-requests-limit: 10
reviewers:
- "security-team"
labels:
- "dependencies"
- "security"
# Group minor/patch updates to reduce PR noise
groups:
minor-and-patch:
update-types:
- "minor"
- "patch"
Dependabot creates PRs automatically when vulnerable versions are detected. Combined with CI tests, this enables one-click security updates.
Layer 4: Software Bill of Materials (SBOM)
# Generate SBOM in CycloneDX format
pip install cyclonedx-bom
cyclonedx-py environment -o sbom.json --format json
# Generate SBOM in SPDX format
pip install spdx-tools
# Or use syft for comprehensive SBOM generation
# syft dir:. -o spdx-json > sbom.spdx.json
An SBOM is a complete inventory of every component in your software. US Executive Order 14028 (2021) requires SBOMs for software sold to the federal government. Even without regulatory requirements, SBOMs enable rapid impact assessment when new vulnerabilities are disclosed.
# Parse CycloneDX SBOM to check components
import json
def check_sbom_against_advisory(sbom_path: str,
advisory_package: str,
advisory_versions: set) -> bool:
"""Check if SBOM contains an affected package version."""
sbom = json.loads(open(sbom_path).read())
for component in sbom.get("components", []):
name = component.get("name", "")
version = component.get("version", "")
if name == advisory_package and version in advisory_versions:
print(f"AFFECTED: {name}=={version}")
return True
return False
Reachability Analysis
Not every vulnerability in a dependency actually affects your application. If the vulnerable function is in a code path you never call, the practical risk is much lower.
# Conceptual reachability check using AST analysis
import ast
import importlib.metadata
from pathlib import Path
def find_imports_in_project(src_dir: Path) -> set[str]:
"""Extract all imported modules from project source."""
imports = set()
for py_file in src_dir.rglob("*.py"):
tree = ast.parse(py_file.read_text())
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
imports.add(alias.name.split(".")[0])
elif isinstance(node, ast.ImportFrom):
if node.module:
imports.add(node.module.split(".")[0])
return imports
def find_unused_dependencies(src_dir: Path) -> set[str]:
"""Find installed packages not directly imported."""
used = find_imports_in_project(src_dir)
installed = {
dist.metadata["Name"].lower().replace("-", "_")
for dist in importlib.metadata.distributions()
}
return installed - used
Snyk’s commercial offering provides call-graph analysis to determine reachability. For open-source projects, manual analysis of the vulnerability description against your usage patterns is the practical approach.
Hash Pinning for Supply Chain Defense
# Generate requirements with hashes
pip-compile --generate-hashes requirements.in > requirements.txt
# Example output:
# requests==2.31.0 \
# --hash=sha256:58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003eb \
# --hash=sha256:942c5a758f98d790eaed1a29cb6eefc7f0edf3fcb0fce8aea3fbd5951d3ccc2e
# Install with hash verification
pip install --require-hashes -r requirements.txt
Hash pinning ensures that even if PyPI is compromised and a malicious version is uploaded with the same version number, installation fails because the hash doesn’t match. This defends against:
- Typosquatting: Malicious packages with names similar to popular ones
- Account takeover: Attacker gains maintainer credentials
- Infrastructure compromise: PyPI itself is breached
Handling Vulnerability Alerts
Triage Decision Tree
class VulnerabilityTriager:
"""Structured triage for dependency vulnerabilities."""
def triage(self, vuln: dict) -> str:
severity = vuln.get("severity", "unknown")
has_fix = vuln.get("fixed_version") is not None
is_reachable = self._check_reachability(vuln)
is_exploitable = self._check_exploitability(vuln)
if severity in ("critical", "high") and is_reachable:
if has_fix:
return "UPGRADE_IMMEDIATELY"
return "APPLY_WORKAROUND_OR_REPLACE"
if severity in ("critical", "high") and not is_reachable:
return "SCHEDULE_UPGRADE_THIS_SPRINT"
if severity == "medium":
if has_fix:
return "UPGRADE_NEXT_RELEASE"
return "MONITOR_AND_DOCUMENT"
return "LOG_AND_MONITOR"
def _check_reachability(self, vuln: dict) -> bool:
"""Does our code call the vulnerable function?"""
# Implementation: AST analysis or manual review
return True # Conservative default
def _check_exploitability(self, vuln: dict) -> bool:
"""Is the vulnerability exploitable in our deployment context?"""
# Example: SSRF vulnerability doesn't apply if we don't fetch URLs
return True # Conservative default
Vulnerability Exception File
# .vulnerability-exceptions.toml
# Document accepted risks with justification and expiration
[[exceptions]]
id = "GHSA-xxxx-yyyy-zzzz"
package = "pillow"
reason = "Vulnerability in TIFF parsing; we only process PNG/JPEG"
accepted_by = "security-team"
accepted_date = "2024-01-15"
expires = "2024-04-15" # Re-evaluate quarterly
Private Package Index Security
For organizations using private PyPI servers (Artifactory, Nexus, devpi):
# pip.conf for secure private index
# [global]
# index-url = https://private.pypi.example.com/simple/
# trusted-host = private.pypi.example.com
# extra-index-url = https://pypi.org/simple/
# DANGER: extra-index-url creates dependency confusion risk
# An attacker can upload a package with the same name as your
# private package to public PyPI with a higher version number.
# pip will prefer the higher version from the public index.
# MITIGATION: Use --index-url only (no extra-index-url)
# Or use pip's --no-deps with explicit dependency resolution
Dependency confusion attacks exploit the interaction between private and public package indices. The defense: use a single index that proxies public packages (Artifactory virtual repos) rather than configuring multiple indices.
Monitoring and Alerting
import json
import subprocess
from datetime import datetime, timezone
def run_audit_and_report() -> dict:
"""Run pip-audit and generate a structured report."""
result = subprocess.run(
["pip-audit", "--format", "json", "--desc"],
capture_output=True, text=True
)
try:
audit_data = json.loads(result.stdout)
except json.JSONDecodeError:
return {"error": "Failed to parse audit output",
"stderr": result.stderr}
vulnerabilities = []
for dep in audit_data.get("dependencies", []):
for vuln in dep.get("vulns", []):
vulnerabilities.append({
"package": dep["name"],
"installed_version": dep["version"],
"vulnerability_id": vuln["id"],
"fix_versions": vuln.get("fix_versions", []),
"description": vuln.get("description", ""),
})
return {
"scan_time": datetime.now(timezone.utc).isoformat(),
"total_packages": len(audit_data.get("dependencies", [])),
"vulnerable_packages": len(vulnerabilities),
"vulnerabilities": vulnerabilities,
}
Run this on a schedule (cron, GitHub Actions schedule) and send results to your alerting system (Slack, PagerDuty, email) when new vulnerabilities appear.
The one thing to remember: dependency vulnerability scanning is the immune system of your software supply chain — it must run continuously, automatically, and with clear response procedures, because the threats evolve every day whether you’re watching or not.
See Also
- Python Certificate Pinning Why your Python app should remember which ID card a server uses — and refuse impostors even if they have official-looking badges.
- Python Cryptography Library Understand Python Cryptography Library with a vivid mental model so secure Python choices feel obvious, not scary.
- Python Hashlib Hashing How Python turns any data into a unique fingerprint — and why that fingerprint can never be reversed.
- Python Hmac Authentication How Python proves a message wasn't tampered with — using a secret handshake only you and the receiver know.
- Python Owasp Top Ten The ten most common ways hackers break into web apps — and how Python developers can stop every single one.