Homomorphic Encryption in Python — Deep Dive
Setting up TenSEAL for CKKS encrypted computation
TenSEAL wraps Microsoft SEAL and provides a Pythonic interface for both BFV and CKKS schemes. Install it with pip install tenseal.
import tenseal as ts
# Create a CKKS context
context = ts.context(
ts.SCHEME_TYPE.CKKS,
poly_modulus_degree=8192,
coeff_mod_bit_sizes=[60, 40, 40, 60]
)
context.generate_galois_keys()
context.global_scale = 2**40
# Encrypt vectors
v1 = ts.ckks_vector(context, [1.5, 2.3, 3.7])
v2 = ts.ckks_vector(context, [4.1, 5.2, 6.0])
# Compute on encrypted data
result_enc = v1 + v2
result_enc = result_enc * ts.ckks_vector(context, [2.0, 2.0, 2.0])
# Decrypt
result = result_enc.decrypt()
print(result) # [11.2, 15.0, 19.4] (approximately)
The coeff_mod_bit_sizes list determines the noise budget. Each intermediate element (the 40s) corresponds to one multiplication level. The first and last elements are for special purposes — the first for the initial noise, the last for the special modulus used in rescaling.
BFV for exact integer computation with Pyfhel
Pyfhel provides a simpler interface when you need exact integer arithmetic rather than approximate real-number computation.
from Pyfhel import Pyfhel
he = Pyfhel()
he.contextGen(scheme='bfv', n=4096, t_bits=20)
he.keyGen()
# Encrypt integers
a = he.encryptInt(42)
b = he.encryptInt(18)
# Exact arithmetic
c = a + b # encrypted 60
d = a * b # encrypted 756
print(he.decryptInt(c)) # 60
print(he.decryptInt(d)) # 756
The t_bits parameter sets the plaintext modulus size. Values up to t_bits=20 give a plaintext space of roughly 1 million. Larger values reduce the noise budget available for multiplications.
Parameter selection — the critical tradeoff
Parameter choice determines security level, computation depth, and performance:
| Parameter | Effect of increasing |
|---|---|
poly_modulus_degree (n) | More noise budget, larger ciphertexts, slower operations |
coeff_mod_bit_sizes total | More multiplication depth, but must stay below n-dependent limit |
| Plaintext modulus (BFV) | Larger value range, less noise budget |
| Scale (CKKS) | Higher precision, consumes more budget per level |
For 128-bit security (standard target), the minimum poly_modulus_degree is 4096. Most practical applications use 8192 or 16384.
A ciphertext with n=8192 is approximately 256 KB. At n=16384, it’s roughly 1 MB. This has real implications for network transfer and memory when operating on large datasets.
Encrypted machine learning inference
One of the most compelling applications: a client encrypts input data, sends it to a server running a pre-trained model, and gets back encrypted predictions. The server never sees the input or the output.
import tenseal as ts
import numpy as np
# Simulated trained model weights (server has these in plaintext)
weights = np.array([[0.3, -0.5, 0.8],
[0.1, 0.7, -0.2],
[-0.4, 0.2, 0.6]])
bias = np.array([0.1, -0.1, 0.05])
# Client side: encrypt input
context = ts.context(
ts.SCHEME_TYPE.CKKS,
poly_modulus_degree=8192,
coeff_mod_bit_sizes=[60, 40, 40, 60]
)
context.global_scale = 2**40
context.generate_galois_keys()
input_data = [1.0, 0.5, -0.3]
enc_input = ts.ckks_vector(context, input_data)
# Server side: compute linear layer on encrypted data
# Server only has the public context (no secret key)
enc_output = enc_input.matmul(weights.T.tolist()) + ts.ckks_vector(context, bias.tolist())
# Client side: decrypt result
output = enc_output.decrypt()
print(output) # Matches plaintext: weights @ input + bias
For non-linear activations (ReLU, sigmoid), you must use polynomial approximations since HE only supports additions and multiplications. A common approach is approximating sigmoid with a low-degree polynomial:
# Square activation (simplest non-linearity for HE)
enc_activated = enc_output * enc_output
# Polynomial approximation of sigmoid: 0.5 + 0.197x - 0.004x^3
def approx_sigmoid(enc_x):
x2 = enc_x * enc_x
x3 = x2 * enc_x
return (enc_x * 0.197) + (x3 * (-0.004)) + 0.5
Each multiplication consumes a level, so deep networks require either larger parameters (slower) or architectural changes to minimize multiplicative depth.
Serialization and the client-server split
In production, the client and server are separate machines. The client must serialize the context (without the secret key) and ciphertexts.
# Client: serialize public context and encrypted data
public_context = context.copy()
public_context.make_context_public()
serialized_ctx = public_context.serialize()
serialized_input = enc_input.serialize()
# Send serialized_ctx and serialized_input to server...
# Server: deserialize and compute
server_ctx = ts.context_from(serialized_ctx)
server_input = ts.ckks_vector_from(server_ctx, serialized_input)
# ... perform computation ...
serialized_result = server_output.serialize()
# Client: deserialize with secret key context and decrypt
client_result = ts.ckks_vector_from(context, serialized_result)
plaintext_result = client_result.decrypt()
The secret key never leaves the client. The server operates entirely on ciphertexts using only the public context and evaluation keys.
Batching with SIMD slots
CKKS and BFV support SIMD (Single Instruction, Multiple Data) batching. A single ciphertext can hold poly_modulus_degree / 2 values. With n=8192, that’s 4096 values processed in parallel.
# Encrypt 4096 values in a single ciphertext
large_vector = list(range(4096))
enc_batch = ts.ckks_vector(context, [float(x) for x in large_vector])
# One operation processes all 4096 values simultaneously
enc_doubled = enc_batch * 2.0
This amortizes the cost of encryption and operations across thousands of values. Without batching, encrypting 4096 values separately would use 4096x the memory and time.
Noise monitoring and debugging
When computations silently fail, it’s usually noise budget exhaustion. TenSEAL doesn’t expose noise budget directly, but you can detect problems by comparing encrypted and plaintext results:
def verify_accuracy(context, encrypted_result, expected_plaintext, tolerance=0.01):
"""Compare decrypted result against expected plaintext."""
decrypted = encrypted_result.decrypt()
for i, (got, expected) in enumerate(zip(decrypted, expected_plaintext)):
diff = abs(got - expected)
if diff > tolerance:
print(f"Position {i}: got {got:.6f}, expected {expected:.6f}, diff {diff:.6f}")
return False
return True
When accuracy degrades, the typical fixes are: increase poly_modulus_degree, add more intermediate coefficient modulus primes, reduce the number of multiplications, or restructure computation to minimize depth.
Performance benchmarks and realistic expectations
On a modern CPU (single-threaded), approximate timings for CKKS with n=8192:
| Operation | Time |
|---|---|
| Encryption (4096 slots) | ~3 ms |
| Addition | ~0.05 ms |
| Multiplication | ~8 ms |
| Rotation | ~6 ms |
| Bootstrapping | ~500 ms |
A simple linear regression inference on 100 features takes about 15 ms encrypted versus 0.001 ms plaintext — roughly 15,000x slowdown. A two-layer neural network inference might take 200 ms encrypted.
For batch processing (processing thousands of inputs with SIMD), the amortized cost per input can be reasonable. Encrypting 4096 inputs and running a linear model on all of them takes the same ~15 ms as running it on one.
When to use HE versus alternatives
Use homomorphic encryption when a single party must compute on another party’s encrypted data. If multiple parties want to jointly compute on their combined data, secure multiparty computation is often more efficient. If you need statistical analysis with formal privacy guarantees on the output, differential privacy is simpler. If the data needs to be hidden but the computation doesn’t, standard encryption with a trusted execution environment may suffice.
The one thing to remember: Practical homomorphic encryption in Python requires careful parameter selection to balance computation depth, precision, and performance — with CKKS for approximate real-number computation and BFV for exact integers being the two main paths.
See Also
- Python Certificate Management How websites prove they are who they say they are — like a digital passport checked every time you visit
- Python Data Masking Techniques How companies hide real names, emails, and credit card numbers while keeping data useful for testing and analytics
- Python Key Management Practices Why the key to your encryption is more important than the encryption itself — and how to keep it safe
- Python Secure Multiparty Computation How a group of friends can figure out who earns the most without anyone revealing their actual salary
- Python Tokenization Sensitive Data How companies replace your real credit card number with a random stand-in that's useless to hackers but works perfectly for the business