Cloud Computing — Deep Dive
How Virtualization Makes the Cloud Possible
The entire cloud model rests on virtualization — the ability to run multiple isolated “virtual machines” on a single piece of physical hardware.
A physical server at AWS might have 192 CPU cores and 768 GB of RAM. Through a hypervisor (software like KVM or Xen), that machine can appear to customers as dozens of independent virtual machines, each with its own OS, network interface, and storage. These VMs have no awareness of each other.
AWS’s hypervisor, called Nitro, offloads virtualization tasks to dedicated hardware chips rather than software. This eliminates the performance overhead traditional hypervisors impose and is why modern EC2 instances can deliver near-bare-metal performance.
Containers: Lighter Than VMs
Containers (Docker, containerd) take a different approach. Instead of virtualizing hardware, they share the host OS kernel while isolating processes at the file system and network level using Linux namespaces and cgroups.
A VM takes 30–60 seconds to boot and uses gigabytes of RAM. A container starts in milliseconds and uses megabytes. This density advantage is why Kubernetes became the dominant way to run cloud workloads.
# A simple containerized Node.js app
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]
Docker builds this into a portable, versioned image. On any cloud machine with Docker installed, this runs identically.
Cloud Regions, Availability Zones, and Fault Tolerance
AWS is divided into Regions (geographic locations like us-east-1, eu-west-1, ap-southeast-2) and within each Region, multiple Availability Zones (AZs) — physically separate data centers, typically 20–50 km apart, with independent power, cooling, and networking.
This architecture enables high availability patterns:
- A load balancer distributes traffic across multiple instances in multiple AZs
- If one AZ goes down (hardware failure, power outage), the others absorb traffic
- Data is replicated synchronously across AZs for databases like RDS Multi-AZ
The 2021 AWS us-east-1 outage — which took down Slack, Netflix, and thousands of other services — happened because so many companies hadn’t distributed their workloads across multiple AZs. They had the tools for resilience; they hadn’t used them.
Multi-Region vs. Multi-AZ
| Setup | Latency overhead | Cost | Resilience against |
|---|---|---|---|
| Single AZ | None | Low | Nothing — single point of failure |
| Multi-AZ | ~1ms | Medium | AZ failure, hardware failure |
| Multi-Region | 20–100ms | High | Region failure, natural disasters, compliance |
Multi-region architectures require solving data consistency problems. If a user writes data in us-east-1, how quickly does eu-west-1 see it? Synchronous replication introduces latency; asynchronous replication introduces the possibility of serving stale data.
This is the CAP theorem in practice: choose two of Consistency, Availability, Partition tolerance. AWS’s DynamoDB global tables, for example, offer eventual consistency across regions with millisecond replication lag.
Auto-Scaling: The Core Economics
The promise of cloud — “pay only for what you use” — requires the ability to scale capacity with demand. AWS Auto Scaling groups, Kubernetes Horizontal Pod Autoscaler, and serverless platforms all implement this in different ways.
EC2 Auto Scaling monitors a metric (CPU utilization, request count, queue depth) and launches or terminates instances based on policies:
{
"ScalingPolicies": [
{
"PolicyType": "TargetTrackingScaling",
"TargetTrackingConfiguration": {
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 60.0
}
}
]
}
This keeps average CPU near 60%. Spike above 60%? New instances launch. Traffic drops? Instances terminate.
The catch: EC2 instances take 2–5 minutes to launch. If your traffic spike is sudden — say, a product launch or a viral tweet — your auto-scaling might not react fast enough. Solutions: warm pools (pre-initialized instances), predictive scaling (ML-based capacity forecasting), or serverless.
Serverless: The Abstraction Ceiling
Serverless (AWS Lambda, Google Cloud Functions, Cloudflare Workers) pushes abstraction further. You provide only the function code; the provider handles everything else — instance provisioning, scaling, patching, OS, runtime.
# Lambda function that processes S3 uploads
import boto3
import json
def handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key=key)
data = json.loads(obj['Body'].read())
# Process...
return {'statusCode': 200, 'body': 'Processed'}
AWS charges per invocation and per 100ms of execution time. At low traffic, Lambda is dramatically cheaper than reserved instances. At very high sustained traffic, the math reverses — a Lambda that handles 10 million requests/month at a sustained rate can cost more than a right-sized EC2 instance.
Cold starts remain a real limitation. A Lambda that hasn’t been invoked recently takes 100–1000ms to initialize. For latency-sensitive APIs, this is a problem. Provisioned Concurrency (paying to keep Lambdas warm) partially solves this at additional cost.
Storage Tiers and the Cost of Data
Cloud storage isn’t one thing — it’s a spectrum of price/performance tradeoffs:
| Service | Type | Latency | Cost | Use case |
|---|---|---|---|---|
| EBS (provisioned) | Block storage | <1ms | High | Database volumes |
| S3 Standard | Object storage | 10–100ms | Medium | Active files, web assets |
| S3 Infrequent Access | Object storage | 10–100ms | Lower | Backups accessed monthly |
| S3 Glacier Instant | Archival | Milliseconds | Very low | Compliance archives |
| S3 Glacier Deep Archive | Archival | 12 hours | Cheapest | 7-year retention, audits |
A common mistake is storing everything in S3 Standard. Netflix stores petabytes of encoded video. The original 4K master files are accessed maybe once a year (for re-encoding). Glacier Deep Archive costs $0.00099/GB/month vs. $0.023/GB/month for Standard — a 23x difference. For Netflix’s scale, the savings are tens of millions of dollars annually.
Egress costs are the hidden billing trap. AWS charges $0.09/GB to move data out of AWS (to the internet or another cloud). Data in is free. This creates lock-in: moving a 1 PB dataset off AWS costs ~$90,000 just in transfer fees, before labor.
The Modern Cloud Architecture Pattern
A mature cloud production system in 2026 typically looks like:
- Edge layer: Cloudflare or CloudFront CDN for static assets and DDoS mitigation
- API Gateway: Rate limiting, auth, request routing
- Compute: ECS/Fargate or Kubernetes (EKS) running containerized services
- Async processing: SQS queues + Lambda or worker services for background jobs
- Data layer: Aurora PostgreSQL (relational), DynamoDB (key-value), ElastiCache Redis (cache), S3 (objects)
- Observability: CloudWatch metrics + X-Ray traces + structured logs to a SIEM
The principle behind this decomposition: fail small. If your image-processing service goes down, your checkout flow shouldn’t. Boundaries between services, message queues between steps, and circuit breakers between dependencies all contribute to this.
Real Cost Optimization: What Actually Works
Cloud bills at scale are shocking without discipline. Common high-impact levers:
- Reserved Instances / Savings Plans: Commit to 1–3 years of compute, save 30–60%. AWS reported in 2023 that customers who fully utilize Savings Plans spend 2.5x less than on-demand.
- Spot Instances: Spare AWS capacity at 70–90% discount, but can be interrupted with 2-minute notice. Best for batch jobs, ML training, stateless workers.
- Right-sizing: Most teams over-provision. AWS Compute Optimizer uses ML to recommend instance size reductions — typical customers see 20–30% savings.
- Data transfer optimization: Keep services in the same AZ for intra-cloud traffic (free). Cross-AZ traffic costs $0.01/GB, which adds up at scale.
Dropbox calculated in 2016 that migrating off AWS to owned infrastructure would save them $75 million over two years. They did it. But Dropbox had predictable, steady traffic and massive scale — the conditions where owned hardware beats the cloud. Most companies never reach that point.
One Thing to Remember
Cloud computing is a spectrum of tradeoffs between control, cost, and operational burden. The more abstraction you accept (serverless > containers > VMs), the less you manage but the less you control. The right answer depends entirely on your traffic patterns, team size, cost sensitivity, and tolerance for vendor lock-in — and that answer changes as you scale.
See Also
- Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
- Vector Databases Google finds web pages by keywords. Your brain finds memories by vibes. Vector databases are how AI does the brain thing — and it's weirder than you'd expect.