Python FTP & SFTP Transfers — Deep Dive

Build production-grade file transfer pipelines in Python with ftplib, paramiko, connection pooling, atomic operations, and enterprise integration patterns.

System-level framing

File transfer protocols remain critical integration infrastructure in finance, healthcare, logistics, and government. Despite REST APIs and message queues taking over many integration patterns, a surprising number of enterprise data flows still depend on dropping files onto SFTP servers on a schedule. Python’s ecosystem handles both legacy FTP and modern SFTP with libraries mature enough for production use.

This deep dive covers ftplib (stdlib), paramiko (SFTP/SSH), and production patterns that make transfers reliable at scale.

FTP with ftplib — beyond basics

FTPS (FTP over TLS)

from ftplib import FTP_TLS

ftps = FTP_TLS("ftp.example.com")
ftps.login("user", "password")
ftps.prot_p()  # Switch to protected (encrypted) data channel
ftps.cwd("/secure-reports")

# List directory
files = ftps.nlst()
print(files)

ftps.quit()

The prot_p() call is critical — without it, only the control channel is encrypted while file data travels in plaintext. This is a common configuration mistake.

Passive vs active mode

FTP uses two connections: a control channel and a data channel. In passive mode (default), the server opens a random port for data transfer. In active mode, the client opens the port and the server connects back.

from ftplib import FTP

ftp = FTP("ftp.example.com")
ftp.login("user", "password")
ftp.set_pasv(False)  # Switch to active mode

Corporate firewalls often break passive mode because they block the server’s random data ports. Active mode can also break if the client is behind NAT. When both fail, SFTP is usually the answer.

Transfer with progress tracking

import os
from ftplib import FTP

def download_with_progress(ftp: FTP, remote_path: str, local_path: str):
    ftp.voidcmd("TYPE I")  # Binary mode
    file_size = ftp.size(remote_path)
    downloaded = 0

    with open(local_path, "wb") as f:
        def callback(data: bytes):
            nonlocal downloaded
            f.write(data)
            downloaded += len(data)
            pct = (downloaded / file_size * 100) if file_size else 0
            print(f"\r{downloaded}/{file_size} bytes ({pct:.1f}%)", end="")

        ftp.retrbinary(f"RETR {remote_path}", callback, blocksize=8192)
    print()  # Newline after progress

SFTP with paramiko — production patterns

Key-based authentication

import paramiko

private_key = paramiko.RSAKey.from_private_key_file(
    "/home/user/.ssh/id_rsa",
    password="optional-passphrase"
)

transport = paramiko.Transport(("sftp.example.com", 22))
transport.connect(username="deploy", pkey=private_key)
sftp = paramiko.SFTPClient.from_transport(transport)

For Ed25519 keys, use paramiko.Ed25519Key. For ECDSA, use paramiko.ECDSAKey. Key-based auth eliminates password management and integrates with SSH agent forwarding.

Host key verification

Production code must verify host keys to prevent man-in-the-middle attacks:

import paramiko

client = paramiko.SSHClient()
client.load_system_host_keys()  # Load from ~/.ssh/known_hosts
client.set_missing_host_key_policy(paramiko.RejectPolicy())  # Reject unknown hosts

client.connect("sftp.example.com", username="deploy", key_filename="/home/user/.ssh/id_rsa")
sftp = client.open_sftp()

Never use AutoAddPolicy in production — it silently accepts any host key, which defeats the purpose of SSH security.

Atomic uploads with rename

Prevent downstream systems from reading partially uploaded files:

import paramiko
import uuid

def atomic_upload(sftp: paramiko.SFTPClient, local_path: str, remote_path: str):
    temp_name = f"{remote_path}.{uuid.uuid4().hex}.tmp"
    try:
        sftp.put(local_path, temp_name)
        sftp.rename(temp_name, remote_path)
    except Exception:
        # Clean up temp file on failure
        try:
            sftp.remove(temp_name)
        except FileNotFoundError:
            pass
        raise

Recursive directory transfer

import os
import stat
import paramiko

def upload_directory(sftp: paramiko.SFTPClient, local_dir: str, remote_dir: str):
    try:
        sftp.mkdir(remote_dir)
    except IOError:
        pass  # Directory may already exist

    for item in os.listdir(local_dir):
        local_path = os.path.join(local_dir, item)
        remote_path = f"{remote_dir}/{item}"

        if os.path.isdir(local_path):
            upload_directory(sftp, local_path, remote_path)
        else:
            sftp.put(local_path, remote_path)

def download_directory(sftp: paramiko.SFTPClient, remote_dir: str, local_dir: str):
    os.makedirs(local_dir, exist_ok=True)

    for attr in sftp.listdir_attr(remote_dir):
        remote_path = f"{remote_dir}/{attr.filename}"
        local_path = os.path.join(local_dir, attr.filename)

        if stat.S_ISDIR(attr.st_mode):
            download_directory(sftp, remote_path, local_path)
        else:
            sftp.get(remote_path, local_path)

Connection pooling and reuse

For applications that transfer many files, creating a new SSH connection per file is expensive. Use a connection pool:

import paramiko
from queue import Queue
from contextlib import contextmanager

class SFTPPool:
    def __init__(self, host: str, username: str, key_path: str, pool_size: int = 5):
        self._pool: Queue[paramiko.SFTPClient] = Queue(maxsize=pool_size)
        self._host = host
        self._username = username
        self._key_path = key_path

        for _ in range(pool_size):
            self._pool.put(self._create_client())

    def _create_client(self) -> paramiko.SFTPClient:
        key = paramiko.RSAKey.from_private_key_file(self._key_path)
        transport = paramiko.Transport((self._host, 22))
        transport.connect(username=self._username, pkey=key)
        return paramiko.SFTPClient.from_transport(transport)

    @contextmanager
    def get_client(self):
        client = self._pool.get()
        try:
            yield client
        except Exception:
            # Replace broken connection
            try:
                client.close()
            except Exception:
                pass
            client = self._create_client()
            raise
        finally:
            self._pool.put(client)

Retry and integrity patterns

import hashlib
import time
import paramiko

def transfer_with_retry(
    sftp: paramiko.SFTPClient,
    local_path: str,
    remote_path: str,
    max_retries: int = 3,
    verify_checksum: bool = True,
):
    for attempt in range(max_retries):
        try:
            sftp.put(local_path, remote_path)

            if verify_checksum:
                local_hash = _file_hash(local_path)
                # Some servers support checksum commands; otherwise download and compare
                remote_size = sftp.stat(remote_path).st_size
                local_size = os.path.getsize(local_path)
                if remote_size != local_size:
                    raise ValueError(
                        f"Size mismatch: local={local_size}, remote={remote_size}"
                    )
            return True

        except (paramiko.SSHException, OSError, ValueError) as e:
            if attempt < max_retries - 1:
                wait = 2 ** attempt
                print(f"Attempt {attempt + 1} failed: {e}. Retrying in {wait}s...")
                time.sleep(wait)
            else:
                raise

def _file_hash(path: str, algo: str = "sha256") -> str:
    h = hashlib.new(algo)
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            h.update(chunk)
    return h.hexdigest()

Monitoring and observability

Production transfer pipelines need logging:

import logging
import time

logger = logging.getLogger("file_transfer")

def monitored_transfer(sftp, local_path, remote_path):
    start = time.monotonic()
    file_size = os.path.getsize(local_path)

    logger.info(
        "Transfer started",
        extra={"local": local_path, "remote": remote_path, "size_bytes": file_size},
    )

    try:
        sftp.put(local_path, remote_path)
        elapsed = time.monotonic() - start
        throughput = file_size / elapsed / 1024 / 1024  # MB/s

        logger.info(
            "Transfer complete",
            extra={
                "elapsed_seconds": round(elapsed, 2),
                "throughput_mbps": round(throughput, 2),
            },
        )
    except Exception as e:
        logger.error("Transfer failed", extra={"error": str(e)})
        raise

Tradeoffs

Approach	Pros	Cons
`ftplib` (stdlib)	Zero dependencies, works with legacy servers	No encryption by default, complex passive/active modes
`ftplib.FTP_TLS`	Encrypted, still stdlib	Certificate management, port issues with firewalls
`paramiko` SFTP	Strong encryption, key auth, single port (22)	External dependency, SSH handshake overhead
`fabric` / `invoke`	High-level API for remote operations	Overkill for pure file transfer
`rsync` via subprocess	Delta transfer, compression, resume	Not a Python library, subprocess management

Security checklist

Never store credentials in code — use environment variables, vault, or SSH keys.
Always verify host keys in production SFTP connections.
Use FTPS prot_p() if you must use FTP — never plaintext for sensitive data.
Rotate SSH keys on a schedule and revoke compromised keys immediately.
Log all transfers with timestamps, file sizes, and source/destination for audit trails.
Set strict file permissions on downloaded files (chmod 600 for sensitive data).

One thing to remember: The reliability of a file transfer pipeline depends more on what you do around the transfer — retries, atomic writes, integrity checks, logging — than on the protocol itself. Python gives you the tools for both the transfer and the surrounding infrastructure.

pythonnetworkingfile-transfer