Scaling & Updates:

Scaling & Updates: Horizontal Pod Autoscaler, Rolling Updates, and Rollbacks

📅 Published: May 2026
⏱️ Estimated Reading Time: 18 minutes
🏷️ Tags: Kubernetes, HPA, Auto Scaling, Rolling Updates, Rollbacks, Deployments

Introduction: The Scaling Challenge

Applications experience varying levels of traffic. Your website might be quiet at 3 AM and overwhelmed at noon. Running the same number of Pods 24/7 wastes resources during quiet times and frustrates users during peak times.

Kubernetes solves this with two powerful features:

Horizontal Pod Autoscaler (HPA) : Automatically adjusts the number of Pods based on CPU, memory, or custom metrics
Rolling Updates: Updates applications without downtime, replacing old Pods with new ones gradually

This guide covers how to scale applications automatically and update them safely.

Part 1: Horizontal Pod Autoscaler (HPA)

What is the Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler automatically scales the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed metrics.

Think of HPA as a thermostat for your application. When it gets "hot" (high CPU usage), it turns on more Pods. When it "cools down" (low CPU usage), it turns off extra Pods.

HPA Based on CPU

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

This HPA configuration means:

Maintain at least 2 Pods, at most 10 Pods
Target average CPU utilization of 50% across all Pods
If CPU exceeds 50%, add Pods
If CPU drops below 50%, remove Pods

How the Algorithm Works

desiredReplicas = ceil[currentReplicas * (currentMetric / desiredMetric)]

Example:
- Current replicas: 4
- Average CPU: 75%
- Desired CPU: 50%
- Calculation: ceil[4 * (75/50)] = ceil[4 * 1.5] = 6

HPA will scale from 4 to 6 replicas.

HPA Based on Memory

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

HPA with Multiple Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 500Mi

When multiple metrics are specified, HPA scales based on the metric that requires the largest number of replicas.

HPA with Custom Metrics (Prometheus)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

HPA Commands

# Create HPA
kubectl autoscale deployment web-deployment --cpu-percent=50 --min=2 --max=10

# Create HPA from YAML
kubectl apply -f hpa.yaml

# List HPAs
kubectl get hpa
kubectl get hpa -n my-namespace

# Describe HPA
kubectl describe hpa web-hpa

# Delete HPA
kubectl delete hpa web-hpa

Watching HPA in Action

# Watch HPA status
kubectl get hpa web-hpa --watch

# Generate load to test scaling
kubectl run load-generator --image=busybox -- /bin/sh -c "while true; do wget -q -O- http://web-service; done"

# Check HPA events
kubectl describe hpa web-hpa

HPA Behavior Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
      # At most 50% of Pods per minute
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

This configuration:

Scales up quickly (no stabilization window)
Scales down slowly (5 minute stabilization window)
Max 50% scale down per minute
Max double the Pod count when scaling up

Part 2: Cluster Autoscaler

What is Cluster Autoscaler?

While HPA scales Pods, Cluster Autoscaler scales nodes. When Pods cannot be scheduled due to insufficient resources, Cluster Autoscaler adds new nodes. When nodes are underutilized, it removes them.

Cluster Autoscaler + HPA working together:

1. CPU increases → HPA adds Pods
2. No room for new Pods → Cluster Autoscaler adds node
3. CPU decreases → HPA removes Pods
4. Node underutilized → Cluster Autoscaler removes node

Cluster Autoscaler on AWS EKS

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.27.0
        name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster

Cluster Autoscaler on GCP GKE

# Enable Cluster Autoscaler on node pool
gcloud container clusters update my-cluster \
  --node-pool=default-pool \
  --enable-autoscaling \
  --min-nodes=1 \
  --max-nodes=10 \
  --zone=us-central1-a

Part 3: Rolling Updates

What is a Rolling Update?

A rolling update gradually replaces old Pods with new ones. It ensures zero downtime by keeping some Pods running while others are being updated.

Instead of:

Delete all old Pods → Create all new Pods (downtime)

Rolling update does:

Create 1 new Pod → Delete 1 old Pod → Repeat

Rolling Update Strategy

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Can create 1 extra Pod
      maxUnavailable: 0  # Cannot delete any Pods before new ones are ready
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx:1.24
        ports:
        - containerPort: 80

Update Process Visualization

Initial state: 4 Pods running version 1.24

Step 1: maxSurge=1 allows 1 extra Pod
┌─────────────────────────────────────────────┐
│ v1.24 │ v1.24 │ v1.24 │ v1.24 │ (creating) │
└─────────────────────────────────────────────┘

Step 2: New Pod becomes ready (v1.25)
┌─────────────────────────────────────────────┐
│ v1.24 │ v1.24 │ v1.24 │ v1.24 │ v1.25 ✓    │
└─────────────────────────────────────────────┘

Step 3: maxUnavailable=0 prevents deleting until new Pod ready
┌─────────────────────────────────────────────┐
│ v1.24 │ v1.24 │ v1.24 │ (terminating) │ v1.25 │
└─────────────────────────────────────────────┘

Step 4: Continue until all v1.25
┌─────────────────────────────────────────────┐
│ v1.25 │ v1.25 │ v1.25 │ v1.25 │            │
└─────────────────────────────────────────────┘

Update Strategies Comparison

Strategy	Description	Downtime	Use Case
RollingUpdate	Gradual replacement	Zero	Production applications
Recreate	All old, then all new	Yes	Development, batch jobs
Blue-Green	Switch traffic	Zero	Critical applications
Canary	Gradual traffic shift	Zero	Progressive delivery

Triggering a Rolling Update

# Update image
kubectl set image deployment/web-deployment nginx=nginx:1.25

# Update with edit
kubectl edit deployment/web-deployment

# Apply changes from YAML
kubectl apply -f deployment.yaml

# Update with patch
kubectl patch deployment web-deployment -p '{"spec":{"template":{"spec":{"containers":[{"name":"nginx","image":"nginx:1.25"}]}}}}'

Monitoring Rolling Updates

# Watch rollout status
kubectl rollout status deployment/web-deployment

# Watch pods during update
kubectl get pods --watch

# Describe deployment events
kubectl describe deployment/web-deployment

Part 4: Rollbacks

What is a Rollback?

A rollback reverts a Deployment to a previous version. When a new version causes problems, you can quickly restore the last working version.

Rollback Commands

# View rollout history
kubectl rollout history deployment/web-deployment
# deployment.apps/web-deployment
# REVISION  CHANGE-CAUSE
# 1         nginx:1.24
# 2         nginx:1.25

# View specific revision details
kubectl rollout history deployment/web-deployment --revision=2

# Rollback to previous revision
kubectl rollout undo deployment/web-deployment

# Rollback to specific revision
kubectl rollout undo deployment/web-deployment --to-revision=1

# Check status after rollback
kubectl rollout status deployment/web-deployment

Tracking Change Causes

# Use --record to record the command that triggered the change (deprecated in newer versions)
kubectl set image deployment/web-deployment nginx=nginx:1.25 --record

# Better: Use annotations
kubectl annotate deployment/web-deployment kubernetes.io/change-cause="Update to nginx:1.25"

Rollback Process

Current version: v1.25 (broken)
Previous version: v1.24 (working)

Rollback:
1. Deployment rolls back to ReplicaSet with v1.24
2. Rolling update process reverses direction
3. Old Pods replaced with v1.24 Pods
4. Original working version restored

Part 5: Readiness and Liveness Probes

Why Probes Matter for Rolling Updates

Probes tell Kubernetes when a Pod is ready to receive traffic and whether it is still healthy. Without proper probes, rolling updates can send traffic to Pods that aren't ready yet.

Readiness Probe

Determines if a Pod is ready to serve traffic.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
spec:
  template:
    spec:
      containers:
      - name: app
        image: myapp:latest
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 3

Liveness Probe

Determines if a Pod is healthy. If it fails, Kubernetes restarts the Pod.

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20
  failureThreshold: 3

Types of Probes

Type	Method	Example
httpGet	HTTP request	`httpGet: {path: /health, port: 8080}`
tcpSocket	TCP connection	`tcpSocket: {port: 3306}`
exec	Command execution	`exec: {command: ["cat", "/tmp/healthy"]}`

Complete Example with Probes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /live
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 20

During a rolling update:

New Pod starts
Readiness probe checks if it's ready
Only after successThreshold (default 1) does the Pod receive traffic
maxUnavailable: 0 ensures old Pods aren't removed until new ones are ready

Real-World Scenarios

Scenario 1: Web Application Auto Scaling

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: webapp:latest
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

Scenario 2: Zero-Downtime Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-api
spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 0
  template:
    spec:
      containers:
      - name: api
        image: myapi:2.0.0
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 20

Scenario 3: Canary Deployment with Two HPA

# Stable version
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-stable
spec:
  replicas: 9
  selector:
    matchLabels:
      app: web
      version: stable
  template:
    metadata:
      labels:
        app: web
        version: stable
    spec:
      containers:
      - name: app
        image: webapp:1.0
---
# Canary version (5% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-canary
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
      version: canary
  template:
    metadata:
      labels:
        app: web
        version: canary
    spec:
      containers:
      - name: app
        image: webapp:2.0
---
# Service routing to both versions
apiVersion: v1
kind: Service
metadata:
  name: web-service
spec:
  selector:
    app: web
  ports:
  - port: 80
    targetPort: 8080

Best Practices

For Autoscaling

Set resource requests and limits: HPA needs these to calculate utilization
Use appropriate metrics: CPU for CPU-bound workloads, memory for memory-bound
Set minReplicas for high availability: At least 2 for critical applications
Set maxReplicas for cost control: Prevent runaway scaling
Use stabilization windows: Prevent flapping during scaling
Test scaling with load: Verify HPA works before production

For Rolling Updates

Use readiness probes: Ensure Pods are ready before receiving traffic
Set maxUnavailable to 0 for critical apps: Prevents downtime
Set maxSurge appropriately: Balance speed and resource usage
Use health checks: Detect failed deployments automatically
Implement rollback procedures: Know how to revert problematic updates
Monitor during updates: Watch rollout status

For Rollbacks

Document change causes: Track what changed and why
Keep revision history: Don't prune deployment history
Test rollback procedures: Practice recovering from bad deployments
Automate rollbacks: Use automation for common failure patterns

Summary

Feature	Purpose	Key Settings
HPA (CPU)	Scale on CPU	`targetAverageUtilization`
HPA (Memory)	Scale on memory	`targetAverageValue`
HPA (Custom)	Scale on app metrics	`pods` or `object` metric
Rolling Update	Zero-downtime deploy	`maxSurge`, `maxUnavailable`
Rollback	Revert bad deploy	`kubectl rollout undo`
Readiness Probe	Traffic readiness	`httpGet`, `initialDelaySeconds`
Liveness Probe	Pod health	`httpGet`, `periodSeconds`

Practice Questions

How does the Horizontal Pod Autoscaler decide when to add more Pods?
What is the difference between maxSurge and maxUnavailable in a rolling update?
Why are readiness probes important for rolling updates?
How do you roll back a deployment to a previous version?
What happens when you set maxUnavailable to 0 and maxSurge to 1?

Learn More

Practice scaling and updates with hands-on exercises in our interactive labs:
https://devops.trainwithsky.com/

SKY Tech – Explore Technology!