Python FTP & SFTP Transfers — Deep Dive
System-level framing
File transfer protocols remain critical integration infrastructure in finance, healthcare, logistics, and government. Despite REST APIs and message queues taking over many integration patterns, a surprising number of enterprise data flows still depend on dropping files onto SFTP servers on a schedule. Python’s ecosystem handles both legacy FTP and modern SFTP with libraries mature enough for production use.
This deep dive covers ftplib (stdlib), paramiko (SFTP/SSH), and production patterns that make transfers reliable at scale.
FTP with ftplib — beyond basics
FTPS (FTP over TLS)
from ftplib import FTP_TLS
ftps = FTP_TLS("ftp.example.com")
ftps.login("user", "password")
ftps.prot_p() # Switch to protected (encrypted) data channel
ftps.cwd("/secure-reports")
# List directory
files = ftps.nlst()
print(files)
ftps.quit()
The prot_p() call is critical — without it, only the control channel is encrypted while file data travels in plaintext. This is a common configuration mistake.
Passive vs active mode
FTP uses two connections: a control channel and a data channel. In passive mode (default), the server opens a random port for data transfer. In active mode, the client opens the port and the server connects back.
from ftplib import FTP
ftp = FTP("ftp.example.com")
ftp.login("user", "password")
ftp.set_pasv(False) # Switch to active mode
Corporate firewalls often break passive mode because they block the server’s random data ports. Active mode can also break if the client is behind NAT. When both fail, SFTP is usually the answer.
Transfer with progress tracking
import os
from ftplib import FTP
def download_with_progress(ftp: FTP, remote_path: str, local_path: str):
ftp.voidcmd("TYPE I") # Binary mode
file_size = ftp.size(remote_path)
downloaded = 0
with open(local_path, "wb") as f:
def callback(data: bytes):
nonlocal downloaded
f.write(data)
downloaded += len(data)
pct = (downloaded / file_size * 100) if file_size else 0
print(f"\r{downloaded}/{file_size} bytes ({pct:.1f}%)", end="")
ftp.retrbinary(f"RETR {remote_path}", callback, blocksize=8192)
print() # Newline after progress
SFTP with paramiko — production patterns
Key-based authentication
import paramiko
private_key = paramiko.RSAKey.from_private_key_file(
"/home/user/.ssh/id_rsa",
password="optional-passphrase"
)
transport = paramiko.Transport(("sftp.example.com", 22))
transport.connect(username="deploy", pkey=private_key)
sftp = paramiko.SFTPClient.from_transport(transport)
For Ed25519 keys, use paramiko.Ed25519Key. For ECDSA, use paramiko.ECDSAKey. Key-based auth eliminates password management and integrates with SSH agent forwarding.
Host key verification
Production code must verify host keys to prevent man-in-the-middle attacks:
import paramiko
client = paramiko.SSHClient()
client.load_system_host_keys() # Load from ~/.ssh/known_hosts
client.set_missing_host_key_policy(paramiko.RejectPolicy()) # Reject unknown hosts
client.connect("sftp.example.com", username="deploy", key_filename="/home/user/.ssh/id_rsa")
sftp = client.open_sftp()
Never use AutoAddPolicy in production — it silently accepts any host key, which defeats the purpose of SSH security.
Atomic uploads with rename
Prevent downstream systems from reading partially uploaded files:
import paramiko
import uuid
def atomic_upload(sftp: paramiko.SFTPClient, local_path: str, remote_path: str):
temp_name = f"{remote_path}.{uuid.uuid4().hex}.tmp"
try:
sftp.put(local_path, temp_name)
sftp.rename(temp_name, remote_path)
except Exception:
# Clean up temp file on failure
try:
sftp.remove(temp_name)
except FileNotFoundError:
pass
raise
Recursive directory transfer
import os
import stat
import paramiko
def upload_directory(sftp: paramiko.SFTPClient, local_dir: str, remote_dir: str):
try:
sftp.mkdir(remote_dir)
except IOError:
pass # Directory may already exist
for item in os.listdir(local_dir):
local_path = os.path.join(local_dir, item)
remote_path = f"{remote_dir}/{item}"
if os.path.isdir(local_path):
upload_directory(sftp, local_path, remote_path)
else:
sftp.put(local_path, remote_path)
def download_directory(sftp: paramiko.SFTPClient, remote_dir: str, local_dir: str):
os.makedirs(local_dir, exist_ok=True)
for attr in sftp.listdir_attr(remote_dir):
remote_path = f"{remote_dir}/{attr.filename}"
local_path = os.path.join(local_dir, attr.filename)
if stat.S_ISDIR(attr.st_mode):
download_directory(sftp, remote_path, local_path)
else:
sftp.get(remote_path, local_path)
Connection pooling and reuse
For applications that transfer many files, creating a new SSH connection per file is expensive. Use a connection pool:
import paramiko
from queue import Queue
from contextlib import contextmanager
class SFTPPool:
def __init__(self, host: str, username: str, key_path: str, pool_size: int = 5):
self._pool: Queue[paramiko.SFTPClient] = Queue(maxsize=pool_size)
self._host = host
self._username = username
self._key_path = key_path
for _ in range(pool_size):
self._pool.put(self._create_client())
def _create_client(self) -> paramiko.SFTPClient:
key = paramiko.RSAKey.from_private_key_file(self._key_path)
transport = paramiko.Transport((self._host, 22))
transport.connect(username=self._username, pkey=key)
return paramiko.SFTPClient.from_transport(transport)
@contextmanager
def get_client(self):
client = self._pool.get()
try:
yield client
except Exception:
# Replace broken connection
try:
client.close()
except Exception:
pass
client = self._create_client()
raise
finally:
self._pool.put(client)
Retry and integrity patterns
import hashlib
import time
import paramiko
def transfer_with_retry(
sftp: paramiko.SFTPClient,
local_path: str,
remote_path: str,
max_retries: int = 3,
verify_checksum: bool = True,
):
for attempt in range(max_retries):
try:
sftp.put(local_path, remote_path)
if verify_checksum:
local_hash = _file_hash(local_path)
# Some servers support checksum commands; otherwise download and compare
remote_size = sftp.stat(remote_path).st_size
local_size = os.path.getsize(local_path)
if remote_size != local_size:
raise ValueError(
f"Size mismatch: local={local_size}, remote={remote_size}"
)
return True
except (paramiko.SSHException, OSError, ValueError) as e:
if attempt < max_retries - 1:
wait = 2 ** attempt
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {wait}s...")
time.sleep(wait)
else:
raise
def _file_hash(path: str, algo: str = "sha256") -> str:
h = hashlib.new(algo)
with open(path, "rb") as f:
for chunk in iter(lambda: f.read(8192), b""):
h.update(chunk)
return h.hexdigest()
Monitoring and observability
Production transfer pipelines need logging:
import logging
import time
logger = logging.getLogger("file_transfer")
def monitored_transfer(sftp, local_path, remote_path):
start = time.monotonic()
file_size = os.path.getsize(local_path)
logger.info(
"Transfer started",
extra={"local": local_path, "remote": remote_path, "size_bytes": file_size},
)
try:
sftp.put(local_path, remote_path)
elapsed = time.monotonic() - start
throughput = file_size / elapsed / 1024 / 1024 # MB/s
logger.info(
"Transfer complete",
extra={
"elapsed_seconds": round(elapsed, 2),
"throughput_mbps": round(throughput, 2),
},
)
except Exception as e:
logger.error("Transfer failed", extra={"error": str(e)})
raise
Tradeoffs
| Approach | Pros | Cons |
|---|---|---|
ftplib (stdlib) | Zero dependencies, works with legacy servers | No encryption by default, complex passive/active modes |
ftplib.FTP_TLS | Encrypted, still stdlib | Certificate management, port issues with firewalls |
paramiko SFTP | Strong encryption, key auth, single port (22) | External dependency, SSH handshake overhead |
fabric / invoke | High-level API for remote operations | Overkill for pure file transfer |
rsync via subprocess | Delta transfer, compression, resume | Not a Python library, subprocess management |
Security checklist
- Never store credentials in code — use environment variables, vault, or SSH keys.
- Always verify host keys in production SFTP connections.
- Use FTPS
prot_p()if you must use FTP — never plaintext for sensitive data. - Rotate SSH keys on a schedule and revoke compromised keys immediately.
- Log all transfers with timestamps, file sizes, and source/destination for audit trails.
- Set strict file permissions on downloaded files (
chmod 600for sensitive data).
One thing to remember: The reliability of a file transfer pipeline depends more on what you do around the transfer — retries, atomic writes, integrity checks, logging — than on the protocol itself. Python gives you the tools for both the transfer and the surrounding infrastructure.
See Also
- Python Dns Resolver Understand how Python translates website names into addresses, like a phone book for the entire internet.
- Python Dpkt Packet Parsing Understand how Python reads and decodes captured network traffic, like opening envelopes to see what is inside each message.
- Python Impacket Security Tools Understand how Python speaks the secret languages of Windows networks, helping security teams find weaknesses before attackers do.
- Python Netconf Yang Understand how Python configures network devices automatically, like a remote control for every router and switch in your building.
- Python Pcap Analysis Understand how Python reads recordings of network traffic, like playing back security camera footage to see what happened on your network.