Kubernetes — Core Concepts

How does a system built by Google in 2014 now run 71% of Fortune 500 infrastructure? Kubernetes isn't just container management — it's a declarative control plane that changed how the industry thinks about reliability.

The Problem It Solved

In 2011, Netflix had a near-catastrophic infrastructure failure. Engineers were manually SSHing into servers, tracking which ones were healthy in spreadsheets, restarting crashed processes by hand. The company was growing 100% year-over-year. That approach was going to kill them.

Google had been solving this problem internally for a decade with a system called Borg. In 2014, they open-sourced a redesigned version: Kubernetes (Greek for “helmsman” or “pilot”). By 2016, it had become the default way to run applications at scale. By 2023, it processed more compute jobs per day than anything else on Earth.

This is the story of how it works.

The Core Idea: Declare What You Want, Not How to Get It

Most software instructions are imperative: do this, then do that. “Start a server. Copy this file. Restart the process.”

Kubernetes is declarative. You tell it the desired state of the world:

“I want 3 copies of my web app running. Each needs 512MB of memory and 1 CPU core. If any copy fails, replace it. Forward internet traffic to whichever ones are healthy.”

Kubernetes figures out how to make that happen — and more importantly, it keeps checking to make sure that state remains true. If the world drifts from what you declared, Kubernetes corrects it. This is called the reconciliation loop, and it runs constantly.

The Building Blocks

Pods — The Smallest Unit

A pod is one or more containers packaged together. Usually it’s one container per pod, but tightly-coupled processes (like an app and its log collector) can share a pod.

Pods are ephemeral. They’re meant to die. Kubernetes will kill and replace them constantly — during updates, after crashes, when rescheduling to a different machine. Designing for this is a mental shift most engineers struggle with initially.

Nodes — The Machines

A node is a physical or virtual machine in your cluster. Pods run on nodes. A cluster typically has dozens to thousands of nodes.

There are two kinds:

Control plane nodes run the Kubernetes brain (the API server, scheduler, and state database)
Worker nodes run your actual application pods

Deployments — How You Describe Apps

You rarely create a pod directly. You create a Deployment: a description of what you want running, how many copies, what image to use, and what to do during updates.

replicas: 5
image: my-web-app:v2.4

Kubernetes creates 5 pods from that spec. Kill one manually — it creates another within seconds. The scheduler decides which node each pod lands on, based on available resources.

Services — Stable Network Addresses

Pods are temporary, but your users need a stable place to send requests. A Service is a permanent address that routes traffic to whatever pods are currently healthy. If pod A dies and pod B replaces it on a different machine with a different IP, the Service doesn’t care — it finds the healthy pods automatically.

ConfigMaps and Secrets

App configuration (database URLs, feature flags, API endpoints) goes in a ConfigMap. Sensitive values (passwords, tokens) go in a Secret. Both get injected into pods at runtime, so you don’t bake credentials into your container images.

Common Misconception: Kubernetes Manages Your Data

It doesn’t. Kubernetes is stateless by nature — it manages compute, not storage. Databases are notoriously painful to run in Kubernetes because if a pod dies, its data dies with it.

Most teams run stateful things (Postgres, MySQL, Cassandra) outside Kubernetes on managed cloud services (AWS RDS, Google Cloud SQL), and only use Kubernetes for stateless application services. There are ways to handle state in Kubernetes (PersistentVolumes, StatefulSets), but it’s significantly more complex and often not worth it.

How Updates Actually Work

This is where Kubernetes earns its reputation. Say you want to update your web app from version 1 to version 2.

A rolling update replaces pods gradually: kill one v1, start one v2, wait for it to be healthy, kill the next v1… until all 5 pods run v2. If any new pod fails to start, Kubernetes stops the rollout and you still have a partially-running v1 serving traffic. Nothing goes down.

If you notice a bug in v2 after the rollout, one command rolls back to v1. Kubernetes reruns the rolling update in reverse.

The Ecosystem

Kubernetes alone is powerful. But most teams use it with a cluster of surrounding tools:

Tool	What It Does
Helm	Package manager for Kubernetes apps (like apt, but for clusters)
Istio	Service mesh — manages traffic between services, adds encryption
Prometheus	Metrics collection for monitoring
ArgoCD	GitOps — syncs your cluster state from a Git repo
cert-manager	Automatic TLS certificates

This ecosystem is also why Kubernetes has a steep learning curve. You don’t just learn one thing — you learn an entire platform.

Who Actually Uses This?

Almost every company with a serious engineering team. Airbnb migrated 1,000+ services to Kubernetes. Spotify runs over 10 million pods per day. The New York Times moved their entire digital infrastructure in 2019.

But it’s also genuinely overkill for small teams. A startup with 3 engineers probably doesn’t need Kubernetes — a simpler platform like Railway, Fly.io, or even Heroku will do fine and won’t require a dedicated DevOps engineer to maintain.

The honest answer: Kubernetes is the right tool when you have enough services and traffic that the complexity pays for itself.

One Thing to Remember

Kubernetes doesn’t run your app — it keeps your desired state of the app alive. You declare “3 copies, always healthy,” and it continuously works to make that true, healing failures and rerouting traffic automatically. That shift from imperative to declarative is the whole idea.

clouddevopskubernetescontainersinfrastructure