TensorFlow Federated Learning — Core Concepts
The Problem Federated Learning Solves
Traditional ML requires centralizing data: copy everything to one server, then train. This creates three problems:
- Privacy — Users may not want their personal data on someone else’s server
- Regulation — Laws like GDPR and HIPAA restrict data movement
- Scale — Moving terabytes from millions of devices is impractical
Federated learning flips the model: instead of bringing data to the computation, bring the computation to the data.
How It Works
A federated learning round follows this cycle:
- Server broadcasts the current global model to a subset of participating devices
- Each device trains the model on its local data for a few epochs
- Each device sends model updates (weight differences) back to the server
- Server aggregates all updates into a single improved global model
- Repeat from step 1
The most common aggregation method is Federated Averaging (FedAvg): average the weight updates from all participating devices, weighted by the number of local training examples each device used.
TensorFlow Federated (TFF)
TensorFlow Federated is Google’s open-source framework for federated computation. It provides two API layers:
Federated Learning (FL) API
High-level tools for common federated learning tasks:
- Build federated versions of Keras models
- Define training and evaluation processes
- Simulate federated scenarios on a single machine
Federated Core (FC) API
Low-level primitives for custom federated algorithms:
- Define computations that run on clients vs. server
- Implement custom aggregation strategies
- Build non-learning federated computations (analytics, statistics)
Key Concepts
Client Selection
Not every device participates in every round. The server selects a random subset (typically hundreds to thousands from millions) based on:
- Device availability — Phone is charging, on WiFi, and idle
- Data freshness — Prioritize devices with new data
- Fairness — Ensure diverse representation across device types and regions
Communication Efficiency
Sending full model updates from millions of devices is expensive. Techniques to reduce bandwidth:
| Technique | How It Works | Savings |
|---|---|---|
| Compression | Quantize updates to fewer bits | 4-8x |
| Sparsification | Send only the largest weight changes | 10-100x |
| Partial training | Train only a subset of layers | 2-5x |
| Gradient sketching | Send compressed summaries | 10-50x |
Google’s production federated systems use a combination of these to keep per-round communication under a few megabytes per device.
Privacy Mechanisms
Federated learning provides baseline privacy (raw data stays on device), but model updates can still leak information. Additional privacy layers:
- Secure Aggregation — Cryptographic protocol where the server only sees the sum of all updates, not individual ones. Even if one device is compromised, its specific update remains hidden.
- Differential Privacy — Add calibrated noise to updates so the server cannot determine whether any specific example was in a device’s training data. Provides mathematical privacy guarantees.
- Clipping — Limit the magnitude of each device’s update to bound the influence of any single device (and its data).
Common Misconception
“Federated learning guarantees privacy.” Without additional mechanisms, federated learning is more private than centralized training but not fully private. Model updates can be reverse-engineered to reconstruct training examples (gradient inversion attacks). True privacy requires secure aggregation plus differential privacy. Google’s production systems use both.
Challenges
Statistical Heterogeneity (Non-IID Data)
In centralized training, you shuffle all data into uniform batches. In federated learning, each device has its own unique data distribution. A phone in Tokyo has different text patterns than one in São Paulo. This “non-IID” nature makes convergence harder and can bias the global model toward majority groups.
Solutions include:
- FedProx — Adds a regularization term that keeps local models close to the global one
- Personalization layers — Train shared base layers federally but keep per-device adaptation layers local
- Clustered federated learning — Group similar clients and train separate models per cluster
Stragglers and Dropped Clients
Devices disconnect, run out of battery, or simply take too long. The server must handle incomplete rounds gracefully — aggregating whatever updates arrive within a time window rather than waiting for all clients.
Real-World Deployments
| Company | Use Case | Scale |
|---|---|---|
| Google (Gboard) | Next-word prediction | Billions of devices |
| Apple | Siri improvements, QuickType | Hundreds of millions |
| Hospitals (NVIDIA FLARE) | Medical imaging models | Cross-institution |
| Financial institutions | Fraud detection | Cross-bank collaboration |
When to Use Federated Learning
Good fit: Data is inherently distributed (mobile devices, hospitals, banks), privacy is a hard requirement, or data cannot be centralized due to regulations.
Not ideal: All data is already centralized, model is too large for edge devices to train, or the number of participants is very small (federated learning needs diversity to work well).
The one thing to remember: Federated learning trains models across devices without centralizing data — but achieving true privacy requires combining it with secure aggregation and differential privacy, not just keeping data on-device.
See Also
- Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.
- Ai Agents Architecture How AI systems go from answering questions to actually doing things — the design patterns that turn language models into autonomous agents that browse, code, and plan.
- Ai Agents ChatGPT answers questions. AI agents actually do things — browse the web, write code, send emails, and keep going until the job is done. Here's the difference.
- Ai Ethics Why building AI fairly is harder than it sounds — bias, accountability, privacy, and who gets to decide what AI is allowed to do.
- Ai Hallucinations ChatGPT sometimes makes up facts with total confidence. Here's the weird reason why — and why it's not as simple as 'the AI lied.'