Scaling & Updates: Horizontal Pod Autoscaler, Rolling Updates, and Rollbacks
📅 Published: May 2026
⏱️ Estimated Reading Time: 18 minutes
🏷️ Tags: Kubernetes, HPA, Auto Scaling, Rolling Updates, Rollbacks, Deployments
Introduction: The Scaling Challenge
Applications experience varying levels of traffic. Your website might be quiet at 3 AM and overwhelmed at noon. Running the same number of Pods 24/7 wastes resources during quiet times and frustrates users during peak times.
Kubernetes solves this with two powerful features:
Horizontal Pod Autoscaler (HPA) : Automatically adjusts the number of Pods based on CPU, memory, or custom metrics
Rolling Updates: Updates applications without downtime, replacing old Pods with new ones gradually
This guide covers how to scale applications automatically and update them safely.
Part 1: Horizontal Pod Autoscaler (HPA)
What is the Horizontal Pod Autoscaler?
The Horizontal Pod Autoscaler automatically scales the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed metrics.
Think of HPA as a thermostat for your application. When it gets "hot" (high CPU usage), it turns on more Pods. When it "cools down" (low CPU usage), it turns off extra Pods.
HPA Based on CPU
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: web-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50
This HPA configuration means:
Maintain at least 2 Pods, at most 10 Pods
Target average CPU utilization of 50% across all Pods
If CPU exceeds 50%, add Pods
If CPU drops below 50%, remove Pods
How the Algorithm Works
desiredReplicas = ceil[currentReplicas * (currentMetric / desiredMetric)] Example: - Current replicas: 4 - Average CPU: 75% - Desired CPU: 50% - Calculation: ceil[4 * (75/50)] = ceil[4 * 1.5] = 6 HPA will scale from 4 to 6 replicas.
HPA Based on Memory
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: web-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80
HPA with Multiple Metrics
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: web-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 - type: Resource resource: name: memory target: type: AverageValue averageValue: 500Mi
When multiple metrics are specified, HPA scales based on the metric that requires the largest number of replicas.
HPA with Custom Metrics (Prometheus)
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: web-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000"
HPA Commands
# Create HPA kubectl autoscale deployment web-deployment --cpu-percent=50 --min=2 --max=10 # Create HPA from YAML kubectl apply -f hpa.yaml # List HPAs kubectl get hpa kubectl get hpa -n my-namespace # Describe HPA kubectl describe hpa web-hpa # Delete HPA kubectl delete hpa web-hpa
Watching HPA in Action
# Watch HPA status kubectl get hpa web-hpa --watch # Generate load to test scaling kubectl run load-generator --image=busybox -- /bin/sh -c "while true; do wget -q -O- http://web-service; done" # Check HPA events kubectl describe hpa web-hpa
HPA Behavior Configuration
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: web-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-deployment minReplicas: 2 maxReplicas: 10 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 60 # At most 50% of Pods per minute scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: Max
This configuration:
Scales up quickly (no stabilization window)
Scales down slowly (5 minute stabilization window)
Max 50% scale down per minute
Max double the Pod count when scaling up
Part 2: Cluster Autoscaler
What is Cluster Autoscaler?
While HPA scales Pods, Cluster Autoscaler scales nodes. When Pods cannot be scheduled due to insufficient resources, Cluster Autoscaler adds new nodes. When nodes are underutilized, it removes them.
Cluster Autoscaler + HPA working together: 1. CPU increases → HPA adds Pods 2. No room for new Pods → Cluster Autoscaler adds node 3. CPU decreases → HPA removes Pods 4. Node underutilized → Cluster Autoscaler removes node
Cluster Autoscaler on AWS EKS
apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system spec: template: spec: containers: - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.27.0 name: cluster-autoscaler command: - ./cluster-autoscaler - --v=4 - --stderrthreshold=info - --cloud-provider=aws - --skip-nodes-with-local-storage=false - --expander=least-waste - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
Cluster Autoscaler on GCP GKE
# Enable Cluster Autoscaler on node pool gcloud container clusters update my-cluster \ --node-pool=default-pool \ --enable-autoscaling \ --min-nodes=1 \ --max-nodes=10 \ --zone=us-central1-a
Part 3: Rolling Updates
What is a Rolling Update?
A rolling update gradually replaces old Pods with new ones. It ensures zero downtime by keeping some Pods running while others are being updated.
Instead of:
Delete all old Pods → Create all new Pods (downtime)
Rolling update does:
Create 1 new Pod → Delete 1 old Pod → Repeat
Rolling Update Strategy
apiVersion: apps/v1 kind: Deployment metadata: name: web-deployment spec: replicas: 4 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # Can create 1 extra Pod maxUnavailable: 0 # Cannot delete any Pods before new ones are ready selector: matchLabels: app: web template: metadata: labels: app: web spec: containers: - name: nginx image: nginx:1.24 ports: - containerPort: 80
Update Process Visualization
Initial state: 4 Pods running version 1.24 Step 1: maxSurge=1 allows 1 extra Pod ┌─────────────────────────────────────────────┐ │ v1.24 │ v1.24 │ v1.24 │ v1.24 │ (creating) │ └─────────────────────────────────────────────┘ Step 2: New Pod becomes ready (v1.25) ┌─────────────────────────────────────────────┐ │ v1.24 │ v1.24 │ v1.24 │ v1.24 │ v1.25 ✓ │ └─────────────────────────────────────────────┘ Step 3: maxUnavailable=0 prevents deleting until new Pod ready ┌─────────────────────────────────────────────┐ │ v1.24 │ v1.24 │ v1.24 │ (terminating) │ v1.25 │ └─────────────────────────────────────────────┘ Step 4: Continue until all v1.25 ┌─────────────────────────────────────────────┐ │ v1.25 │ v1.25 │ v1.25 │ v1.25 │ │ └─────────────────────────────────────────────┘
Update Strategies Comparison
| Strategy | Description | Downtime | Use Case |
|---|---|---|---|
| RollingUpdate | Gradual replacement | Zero | Production applications |
| Recreate | All old, then all new | Yes | Development, batch jobs |
| Blue-Green | Switch traffic | Zero | Critical applications |
| Canary | Gradual traffic shift | Zero | Progressive delivery |
Triggering a Rolling Update
# Update image kubectl set image deployment/web-deployment nginx=nginx:1.25 # Update with edit kubectl edit deployment/web-deployment # Apply changes from YAML kubectl apply -f deployment.yaml # Update with patch kubectl patch deployment web-deployment -p '{"spec":{"template":{"spec":{"containers":[{"name":"nginx","image":"nginx:1.25"}]}}}}'
Monitoring Rolling Updates
# Watch rollout status kubectl rollout status deployment/web-deployment # Watch pods during update kubectl get pods --watch # Describe deployment events kubectl describe deployment/web-deployment
Part 4: Rollbacks
What is a Rollback?
A rollback reverts a Deployment to a previous version. When a new version causes problems, you can quickly restore the last working version.
Rollback Commands
# View rollout history kubectl rollout history deployment/web-deployment # deployment.apps/web-deployment # REVISION CHANGE-CAUSE # 1 nginx:1.24 # 2 nginx:1.25 # View specific revision details kubectl rollout history deployment/web-deployment --revision=2 # Rollback to previous revision kubectl rollout undo deployment/web-deployment # Rollback to specific revision kubectl rollout undo deployment/web-deployment --to-revision=1 # Check status after rollback kubectl rollout status deployment/web-deployment
Tracking Change Causes
# Use --record to record the command that triggered the change (deprecated in newer versions) kubectl set image deployment/web-deployment nginx=nginx:1.25 --record # Better: Use annotations kubectl annotate deployment/web-deployment kubernetes.io/change-cause="Update to nginx:1.25"
Rollback Process
Current version: v1.25 (broken) Previous version: v1.24 (working) Rollback: 1. Deployment rolls back to ReplicaSet with v1.24 2. Rolling update process reverses direction 3. Old Pods replaced with v1.24 Pods 4. Original working version restored
Part 5: Readiness and Liveness Probes
Why Probes Matter for Rolling Updates
Probes tell Kubernetes when a Pod is ready to receive traffic and whether it is still healthy. Without proper probes, rolling updates can send traffic to Pods that aren't ready yet.
Readiness Probe
Determines if a Pod is ready to serve traffic.
apiVersion: apps/v1 kind: Deployment metadata: name: web-deployment spec: template: spec: containers: - name: app image: myapp:latest readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 10 successThreshold: 1 failureThreshold: 3
Liveness Probe
Determines if a Pod is healthy. If it fails, Kubernetes restarts the Pod.
livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 15 periodSeconds: 20 failureThreshold: 3
Types of Probes
| Type | Method | Example |
|---|---|---|
| httpGet | HTTP request | httpGet: {path: /health, port: 8080} |
| tcpSocket | TCP connection | tcpSocket: {port: 3306} |
| exec | Command execution | exec: {command: ["cat", "/tmp/healthy"]} |
Complete Example with Probes
apiVersion: apps/v1 kind: Deployment metadata: name: web-deployment spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: web template: metadata: labels: app: web spec: containers: - name: app image: myapp:latest ports: - containerPort: 8080 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 livenessProbe: httpGet: path: /live port: 8080 initialDelaySeconds: 15 periodSeconds: 20
During a rolling update:
New Pod starts
Readiness probe checks if it's ready
Only after
successThreshold(default 1) does the Pod receive trafficmaxUnavailable: 0ensures old Pods aren't removed until new ones are ready
Real-World Scenarios
Scenario 1: Web Application Auto Scaling
apiVersion: apps/v1 kind: Deployment metadata: name: web-deployment spec: replicas: 3 template: spec: containers: - name: app image: webapp:latest resources: requests: cpu: "250m" memory: "256Mi" limits: cpu: "500m" memory: "512Mi" --- apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: web-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-deployment minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 25 periodSeconds: 60 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15
Scenario 2: Zero-Downtime Deployment
apiVersion: apps/v1 kind: Deployment metadata: name: critical-api spec: replicas: 5 strategy: type: RollingUpdate rollingUpdate: maxSurge: 2 maxUnavailable: 0 template: spec: containers: - name: api image: myapi:2.0.0 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 3 livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 15 periodSeconds: 20
Scenario 3: Canary Deployment with Two HPA
# Stable version apiVersion: apps/v1 kind: Deployment metadata: name: web-stable spec: replicas: 9 selector: matchLabels: app: web version: stable template: metadata: labels: app: web version: stable spec: containers: - name: app image: webapp:1.0 --- # Canary version (5% traffic) apiVersion: apps/v1 kind: Deployment metadata: name: web-canary spec: replicas: 1 selector: matchLabels: app: web version: canary template: metadata: labels: app: web version: canary spec: containers: - name: app image: webapp:2.0 --- # Service routing to both versions apiVersion: v1 kind: Service metadata: name: web-service spec: selector: app: web ports: - port: 80 targetPort: 8080
Best Practices
For Autoscaling
Set resource requests and limits: HPA needs these to calculate utilization
Use appropriate metrics: CPU for CPU-bound workloads, memory for memory-bound
Set minReplicas for high availability: At least 2 for critical applications
Set maxReplicas for cost control: Prevent runaway scaling
Use stabilization windows: Prevent flapping during scaling
Test scaling with load: Verify HPA works before production
For Rolling Updates
Use readiness probes: Ensure Pods are ready before receiving traffic
Set maxUnavailable to 0 for critical apps: Prevents downtime
Set maxSurge appropriately: Balance speed and resource usage
Use health checks: Detect failed deployments automatically
Implement rollback procedures: Know how to revert problematic updates
Monitor during updates: Watch rollout status
For Rollbacks
Document change causes: Track what changed and why
Keep revision history: Don't prune deployment history
Test rollback procedures: Practice recovering from bad deployments
Automate rollbacks: Use automation for common failure patterns
Summary
| Feature | Purpose | Key Settings |
|---|---|---|
| HPA (CPU) | Scale on CPU | targetAverageUtilization |
| HPA (Memory) | Scale on memory | targetAverageValue |
| HPA (Custom) | Scale on app metrics | pods or object metric |
| Rolling Update | Zero-downtime deploy | maxSurge, maxUnavailable |
| Rollback | Revert bad deploy | kubectl rollout undo |
| Readiness Probe | Traffic readiness | httpGet, initialDelaySeconds |
| Liveness Probe | Pod health | httpGet, periodSeconds |
Practice Questions
How does the Horizontal Pod Autoscaler decide when to add more Pods?
What is the difference between maxSurge and maxUnavailable in a rolling update?
Why are readiness probes important for rolling updates?
How do you roll back a deployment to a previous version?
What happens when you set maxUnavailable to 0 and maxSurge to 1?
Learn More
Practice scaling and updates with hands-on exercises in our interactive labs:
https://devops.trainwithsky.com/
Comments
Post a Comment