Skip to main content

Scaling & Updates:

 Scaling & Updates: Horizontal Pod Autoscaler, Rolling Updates, and Rollbacks

📅 Published: May 2026
⏱️ Estimated Reading Time: 18 minutes
🏷️ Tags: Kubernetes, HPA, Auto Scaling, Rolling Updates, Rollbacks, Deployments


Introduction: The Scaling Challenge

Applications experience varying levels of traffic. Your website might be quiet at 3 AM and overwhelmed at noon. Running the same number of Pods 24/7 wastes resources during quiet times and frustrates users during peak times.

Kubernetes solves this with two powerful features:

  • Horizontal Pod Autoscaler (HPA) : Automatically adjusts the number of Pods based on CPU, memory, or custom metrics

  • Rolling Updates: Updates applications without downtime, replacing old Pods with new ones gradually

This guide covers how to scale applications automatically and update them safely.


Part 1: Horizontal Pod Autoscaler (HPA)

What is the Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler automatically scales the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed metrics.

Think of HPA as a thermostat for your application. When it gets "hot" (high CPU usage), it turns on more Pods. When it "cools down" (low CPU usage), it turns off extra Pods.

HPA Based on CPU

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

This HPA configuration means:

  • Maintain at least 2 Pods, at most 10 Pods

  • Target average CPU utilization of 50% across all Pods

  • If CPU exceeds 50%, add Pods

  • If CPU drops below 50%, remove Pods

How the Algorithm Works

text
desiredReplicas = ceil[currentReplicas * (currentMetric / desiredMetric)]

Example:
- Current replicas: 4
- Average CPU: 75%
- Desired CPU: 50%
- Calculation: ceil[4 * (75/50)] = ceil[4 * 1.5] = 6

HPA will scale from 4 to 6 replicas.

HPA Based on Memory

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

HPA with Multiple Metrics

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 500Mi

When multiple metrics are specified, HPA scales based on the metric that requires the largest number of replicas.

HPA with Custom Metrics (Prometheus)

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

HPA Commands

bash
# Create HPA
kubectl autoscale deployment web-deployment --cpu-percent=50 --min=2 --max=10

# Create HPA from YAML
kubectl apply -f hpa.yaml

# List HPAs
kubectl get hpa
kubectl get hpa -n my-namespace

# Describe HPA
kubectl describe hpa web-hpa

# Delete HPA
kubectl delete hpa web-hpa

Watching HPA in Action

bash
# Watch HPA status
kubectl get hpa web-hpa --watch

# Generate load to test scaling
kubectl run load-generator --image=busybox -- /bin/sh -c "while true; do wget -q -O- http://web-service; done"

# Check HPA events
kubectl describe hpa web-hpa

HPA Behavior Configuration

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
      # At most 50% of Pods per minute
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

This configuration:

  • Scales up quickly (no stabilization window)

  • Scales down slowly (5 minute stabilization window)

  • Max 50% scale down per minute

  • Max double the Pod count when scaling up


Part 2: Cluster Autoscaler

What is Cluster Autoscaler?

While HPA scales Pods, Cluster Autoscaler scales nodes. When Pods cannot be scheduled due to insufficient resources, Cluster Autoscaler adds new nodes. When nodes are underutilized, it removes them.

text
Cluster Autoscaler + HPA working together:

1. CPU increases → HPA adds Pods
2. No room for new Pods → Cluster Autoscaler adds node
3. CPU decreases → HPA removes Pods
4. Node underutilized → Cluster Autoscaler removes node

Cluster Autoscaler on AWS EKS

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.27.0
        name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster

Cluster Autoscaler on GCP GKE

bash
# Enable Cluster Autoscaler on node pool
gcloud container clusters update my-cluster \
  --node-pool=default-pool \
  --enable-autoscaling \
  --min-nodes=1 \
  --max-nodes=10 \
  --zone=us-central1-a

Part 3: Rolling Updates

What is a Rolling Update?

A rolling update gradually replaces old Pods with new ones. It ensures zero downtime by keeping some Pods running while others are being updated.

Instead of:

  • Delete all old Pods → Create all new Pods (downtime)

Rolling update does:

  • Create 1 new Pod → Delete 1 old Pod → Repeat

Rolling Update Strategy

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Can create 1 extra Pod
      maxUnavailable: 0  # Cannot delete any Pods before new ones are ready
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx:1.24
        ports:
        - containerPort: 80

Update Process Visualization

text
Initial state: 4 Pods running version 1.24

Step 1: maxSurge=1 allows 1 extra Pod
┌─────────────────────────────────────────────┐
│ v1.24 │ v1.24 │ v1.24 │ v1.24 │ (creating) │
└─────────────────────────────────────────────┘

Step 2: New Pod becomes ready (v1.25)
┌─────────────────────────────────────────────┐
│ v1.24 │ v1.24 │ v1.24 │ v1.24 │ v1.25 ✓    │
└─────────────────────────────────────────────┘

Step 3: maxUnavailable=0 prevents deleting until new Pod ready
┌─────────────────────────────────────────────┐
│ v1.24 │ v1.24 │ v1.24 │ (terminating) │ v1.25 │
└─────────────────────────────────────────────┘

Step 4: Continue until all v1.25
┌─────────────────────────────────────────────┐
│ v1.25 │ v1.25 │ v1.25 │ v1.25 │            │
└─────────────────────────────────────────────┘

Update Strategies Comparison

StrategyDescriptionDowntimeUse Case
RollingUpdateGradual replacementZeroProduction applications
RecreateAll old, then all newYesDevelopment, batch jobs
Blue-GreenSwitch trafficZeroCritical applications
CanaryGradual traffic shiftZeroProgressive delivery

Triggering a Rolling Update

bash
# Update image
kubectl set image deployment/web-deployment nginx=nginx:1.25

# Update with edit
kubectl edit deployment/web-deployment

# Apply changes from YAML
kubectl apply -f deployment.yaml

# Update with patch
kubectl patch deployment web-deployment -p '{"spec":{"template":{"spec":{"containers":[{"name":"nginx","image":"nginx:1.25"}]}}}}'

Monitoring Rolling Updates

bash
# Watch rollout status
kubectl rollout status deployment/web-deployment

# Watch pods during update
kubectl get pods --watch

# Describe deployment events
kubectl describe deployment/web-deployment

Part 4: Rollbacks

What is a Rollback?

A rollback reverts a Deployment to a previous version. When a new version causes problems, you can quickly restore the last working version.

Rollback Commands

bash
# View rollout history
kubectl rollout history deployment/web-deployment
# deployment.apps/web-deployment
# REVISION  CHANGE-CAUSE
# 1         nginx:1.24
# 2         nginx:1.25

# View specific revision details
kubectl rollout history deployment/web-deployment --revision=2

# Rollback to previous revision
kubectl rollout undo deployment/web-deployment

# Rollback to specific revision
kubectl rollout undo deployment/web-deployment --to-revision=1

# Check status after rollback
kubectl rollout status deployment/web-deployment

Tracking Change Causes

bash
# Use --record to record the command that triggered the change (deprecated in newer versions)
kubectl set image deployment/web-deployment nginx=nginx:1.25 --record

# Better: Use annotations
kubectl annotate deployment/web-deployment kubernetes.io/change-cause="Update to nginx:1.25"

Rollback Process

text
Current version: v1.25 (broken)
Previous version: v1.24 (working)

Rollback:
1. Deployment rolls back to ReplicaSet with v1.24
2. Rolling update process reverses direction
3. Old Pods replaced with v1.24 Pods
4. Original working version restored

Part 5: Readiness and Liveness Probes

Why Probes Matter for Rolling Updates

Probes tell Kubernetes when a Pod is ready to receive traffic and whether it is still healthy. Without proper probes, rolling updates can send traffic to Pods that aren't ready yet.

Readiness Probe

Determines if a Pod is ready to serve traffic.

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
spec:
  template:
    spec:
      containers:
      - name: app
        image: myapp:latest
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 3

Liveness Probe

Determines if a Pod is healthy. If it fails, Kubernetes restarts the Pod.

yaml
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20
  failureThreshold: 3

Types of Probes

TypeMethodExample
httpGetHTTP requesthttpGet: {path: /health, port: 8080}
tcpSocketTCP connectiontcpSocket: {port: 3306}
execCommand executionexec: {command: ["cat", "/tmp/healthy"]}

Complete Example with Probes

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /live
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 20

During a rolling update:

  1. New Pod starts

  2. Readiness probe checks if it's ready

  3. Only after successThreshold (default 1) does the Pod receive traffic

  4. maxUnavailable: 0 ensures old Pods aren't removed until new ones are ready


Real-World Scenarios

Scenario 1: Web Application Auto Scaling

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: webapp:latest
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

Scenario 2: Zero-Downtime Deployment

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-api
spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 0
  template:
    spec:
      containers:
      - name: api
        image: myapi:2.0.0
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 20

Scenario 3: Canary Deployment with Two HPA

yaml
# Stable version
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-stable
spec:
  replicas: 9
  selector:
    matchLabels:
      app: web
      version: stable
  template:
    metadata:
      labels:
        app: web
        version: stable
    spec:
      containers:
      - name: app
        image: webapp:1.0
---
# Canary version (5% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-canary
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
      version: canary
  template:
    metadata:
      labels:
        app: web
        version: canary
    spec:
      containers:
      - name: app
        image: webapp:2.0
---
# Service routing to both versions
apiVersion: v1
kind: Service
metadata:
  name: web-service
spec:
  selector:
    app: web
  ports:
  - port: 80
    targetPort: 8080

Best Practices

For Autoscaling

  • Set resource requests and limits: HPA needs these to calculate utilization

  • Use appropriate metrics: CPU for CPU-bound workloads, memory for memory-bound

  • Set minReplicas for high availability: At least 2 for critical applications

  • Set maxReplicas for cost control: Prevent runaway scaling

  • Use stabilization windows: Prevent flapping during scaling

  • Test scaling with load: Verify HPA works before production

For Rolling Updates

  • Use readiness probes: Ensure Pods are ready before receiving traffic

  • Set maxUnavailable to 0 for critical apps: Prevents downtime

  • Set maxSurge appropriately: Balance speed and resource usage

  • Use health checks: Detect failed deployments automatically

  • Implement rollback procedures: Know how to revert problematic updates

  • Monitor during updates: Watch rollout status

For Rollbacks

  • Document change causes: Track what changed and why

  • Keep revision history: Don't prune deployment history

  • Test rollback procedures: Practice recovering from bad deployments

  • Automate rollbacks: Use automation for common failure patterns


Summary

FeaturePurposeKey Settings
HPA (CPU)Scale on CPUtargetAverageUtilization
HPA (Memory)Scale on memorytargetAverageValue
HPA (Custom)Scale on app metricspods or object metric
Rolling UpdateZero-downtime deploymaxSurge, maxUnavailable
RollbackRevert bad deploykubectl rollout undo
Readiness ProbeTraffic readinesshttpGet, initialDelaySeconds
Liveness ProbePod healthhttpGet, periodSeconds

Practice Questions

  1. How does the Horizontal Pod Autoscaler decide when to add more Pods?

  2. What is the difference between maxSurge and maxUnavailable in a rolling update?

  3. Why are readiness probes important for rolling updates?

  4. How do you roll back a deployment to a previous version?

  5. What happens when you set maxUnavailable to 0 and maxSurge to 1?


Learn More

Practice scaling and updates with hands-on exercises in our interactive labs:
https://devops.trainwithsky.com/

Comments

Popular posts from this blog

📊 Monitoring & Logging in Kubernetes – Tools like Prometheus, Grafana, and Fluentd

  Monitoring & Logging in Kubernetes – Tools like Prometheus, Grafana, and Fluentd Monitoring and logging are essential for maintaining a healthy and well-performing Kubernetes cluster. In this guide, we’ll cover why monitoring is important, key monitoring tools like Prometheus and Grafana, and logging tools like Fluentd to help you gain visibility into your cluster’s performance and logs. Shape Your Future with AI & Infinite Knowledge...!! Want to Generate Text-to-Voice, Images & Videos? http://www.ai.skyinfinitetech.com Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! 🚀 Introduction In today’s fast-paced cloud-native environment, Kubernetes has emerged as the de-facto container orchestration platform. But deploying and managing applications in Kubernetes is just half the ba...

How to Use SKY TTS: The Complete, Step-by-Step Guide for 2025

 What is SKY TTS? SKY TTS  is a free, next-generation  AI audio creation platform  that brings together high-quality  Text-to-Speech ,  Speech-to-Text , and a full suite of professional  audio editing tools  in one seamless experience. Our vision is simple — to make advanced audio technology  free, accessible, and effortless  for everyone. From creators and educators to podcasters, developers, and businesses, SKY TTS helps users produce  studio-grade voice content  without expensive software or technical skills. With support for  70+ languages, natural voices, audio enhancement, waveform generation, and batch automation , SKY TTS has become a trusted all-in-one toolkit for modern digital audio workflows. Why Choose SKY TTS? Instant Conversion:  Enjoy rapid text-to-speech generation, even with large documents. Advanced Voice Settings:   Adjust speed, pitch, and style for a personalized listening experience. Multi-...

Introduction to Terraform – The Future of Infrastructure as Code

  Introduction to Terraform – The Future of Infrastructure as Code In today’s fast-paced DevOps world, managing infrastructure manually is outdated . This is where Terraform comes in—a powerful Infrastructure as Code (IaC) tool that allows you to define, provision, and manage cloud infrastructure efficiently . Whether you're working with AWS, Azure, Google Cloud, or on-premises servers , Terraform provides a declarative, automation-first approach to infrastructure deployment. Shape Your Future with AI & Infinite Knowledge...!! Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! In today’s digital-first world, agility and automation are no longer optional—they’re essential. Companies across the globe are rapidly shifting their operations to the cloud to keep up with the pace of innovatio...