Face Recognition in Python — Deep Dive

Build a production face recognition system in Python with ArcFace embeddings, FAISS indexing, and bias-aware evaluation.

Production face recognition systems handle millions of identities, sub-second latency requirements, and strict fairness constraints. This guide covers architecture design, model selection, indexing strategies, and the testing discipline needed to ship responsibly.

Embedding model selection

ArcFace (Additive Angular Margin Loss)

ArcFace is the current standard for face embedding quality. It adds an angular margin penalty during training that forces embeddings of the same person closer together while pushing different identities further apart on a hypersphere.

The loss function modifies softmax:

L = -log(e^(s·cos(θ_yi + m)) / (e^(s·cos(θ_yi + m)) + Σ e^(s·cos(θ_j))))

Where s is a scaling factor (typically 64), m is the angular margin (0.5 radians), θ_yi is the angle between the embedding and the weight vector for the true class.

InsightFace implementation

from insightface.app import FaceAnalysis

app = FaceAnalysis(providers=["CUDAExecutionProvider"])
app.prepare(ctx_id=0, det_size=(640, 640))

faces = app.get(image)
for face in faces:
    embedding = face.normed_embedding  # 512-dim, L2-normalized
    bbox = face.bbox
    landmarks = face.kps  # 5 keypoints

InsightFace bundles detection (RetinaFace), alignment, and embedding (ArcFace) into a single pipeline. The ONNX backend allows swapping providers between CUDA, TensorRT, and CPU without code changes.

Model comparison

Model	Embedding dim	LFW accuracy	CFP-FP accuracy	Speed (GPU)
ArcFace-R100	512	99.77%	98.27%	12ms
ArcFace-R50	512	99.72%	97.85%	7ms
MobileFaceNet	128	99.50%	95.10%	2ms
AdaFace	512	99.80%	98.49%	14ms

CFP-FP (Celebrities in Frontal-Profile) is a harder benchmark than LFW because it includes cross-pose matching. Always report both.

Indexing at scale with FAISS

Comparing a query embedding against millions of stored embeddings with brute-force distance calculation is O(n). FAISS (Facebook AI Similarity Search) provides approximate nearest neighbor (ANN) structures that reduce this to sub-linear time.

Building an index

import faiss
import numpy as np

dimension = 512
embeddings = np.load("all_embeddings.npy").astype("float32")

# IVF with product quantization — good balance of speed and memory
quantizer = faiss.IndexFlatIP(dimension)
index = faiss.IndexIVFPQ(quantizer, dimension, nlist=1024, m=64, nbits=8)

# Must train on representative data before adding
index.train(embeddings)
index.add(embeddings)

# Search
index.nprobe = 32  # search 32 of 1024 clusters
distances, indices = index.search(query_embedding.reshape(1, -1), k=5)

Index selection guide

Index type	Memory per vector	Search speed (1M vectors)	Recall@1
Flat (brute force)	2048 bytes	50ms	100%
IVF-Flat	2048 bytes	3ms	98%
IVF-PQ (m=64)	64 bytes	1ms	95%
HNSW	2560 bytes	0.5ms	99%

For systems under 100K identities, brute-force Flat index is fast enough and simplest to maintain. Beyond that, IVF-PQ or HNSW become necessary.

System architecture

A production deployment typically has these components:

Camera/Client → API Gateway → Face Service
                                  ├── Detection + Alignment
                                  ├── Embedding Generation
                                  ├── FAISS Search
                                  └── Identity Database (PostgreSQL)

Enrollment flow

Client uploads one or more photos of the person.
Service detects faces, rejects images with zero or multiple faces.
Each face is aligned and embedded.
Multiple embeddings are averaged (centroid) for robustness.
The centroid embedding is stored in FAISS and PostgreSQL with the person’s ID.

Recognition flow

Client sends a query image.
Service detects, aligns, and embeds the face.
FAISS returns the top-k nearest neighbors with distances.
If the closest distance is below the threshold, return the matched identity.
If above threshold, return “unknown.”

Threshold calibration

The accept/reject threshold is the most important parameter. Setting it requires balancing:

False Accept Rate (FAR): Wrong person gets through.
False Reject Rate (FRR): Right person gets blocked.

Plot a DET (Detection Error Tradeoff) curve on your validation set. Choose the threshold at the operating point your application tolerates. Access control systems typically need FAR < 0.1%; photo-organizing apps can tolerate FAR of 1–2%.

Anti-spoofing

A photo of a face held up to a camera will match. Production systems need liveness detection:

Passive liveness: A single-frame CNN trained to distinguish live faces from printed photos, screen replays, and 3D masks. MiniFASNet achieves >99% accuracy.
Active liveness: Ask the user to blink, turn their head, or follow a random prompt. Harder to spoof but worse user experience.

from insightface.app import FaceAnalysis

app = FaceAnalysis(providers=["CUDAExecutionProvider"])
app.prepare(ctx_id=0)

faces = app.get(frame)
for face in faces:
    # Some InsightFace models include a liveness score
    # For dedicated anti-spoofing, use Silent-Face or MiniFASNet
    det_score = face.det_score

Bias testing

Evaluation across demographics

Split your test set by demographic groups and compute accuracy metrics for each:

from sklearn.metrics import roc_curve

for group in ["male", "female", "age_18_30", "age_60_plus"]:
    subset = test_data[test_data.group == group]
    fpr, tpr, thresholds = roc_curve(subset.is_match, subset.distance)
    # Find FRR at fixed FAR=0.01%
    frr_at_far = 1 - tpr[np.searchsorted(fpr, 0.0001)]
    print(f"{group}: FRR@FAR=0.01% = {frr_at_far:.4f}")

If FRR varies by more than 2× across groups, the model has a bias problem. Solutions include:

Training on more balanced datasets (e.g., WebFace42M covers broader demographics than older datasets).
Using adaptive thresholds per demographic cluster.
Reporting disaggregated metrics in your model card.

Performance optimization

Batch processing

Process multiple faces per GPU call. InsightFace supports batched embedding:

# Stack aligned face crops into a batch
batch = np.stack([face.normed_embedding for face in faces])

Model quantization

ONNX Runtime with TensorRT execution provider supports FP16 and INT8:

import onnxruntime as ort

sess = ort.InferenceSession(
    "arcface_r100.onnx",
    providers=[("TensorRTExecutionProvider", {"trt_fp16_enable": True})]
)

FP16 typically halves inference time with <0.1% accuracy loss on face embeddings.

Caching

Cache embeddings for known faces. Re-computing embeddings on every request wastes GPU cycles. Store embeddings in Redis for hot lookups and PostgreSQL for persistence.

Privacy and compliance

GDPR Article 9: Biometric data requires explicit consent and a lawful basis.
Illinois BIPA: Requires written consent before collecting biometric identifiers.
Data minimization: Store embeddings, not face images, when possible. Embeddings cannot be reversed into recognizable photos.
Retention: Define and enforce maximum storage duration for biometric templates.
Right to deletion: Your system must support removing a person’s embeddings and all derived data on request.

The one thing to remember: Production face recognition is an engineering system, not just a model — indexing, threshold calibration, anti-spoofing, and bias testing matter as much as embedding quality.

pythonface-recognitioncomputer-vision