Skip to main content

Kubernetes Interview & Scenarios

 Kubernetes Interview & Scenarios: Real-world Troubleshooting, Interview Questions, and Hands-on Practice Tasks

📅 Published: May 2026
⏱️ Estimated Reading Time: 22 minutes
🏷️ Tags: Kubernetes Interview, K8s Troubleshooting, DevOps Interview, Container Orchestration


Introduction: What Kubernetes Interviewers Look For

Kubernetes interviews test more than your knowledge of YAML syntax. Interviewers want to see that you understand how Kubernetes works under the hood, how to troubleshoot problems, and how to design resilient systems.

The most valued Kubernetes skills in interviews are:

  • Understanding of control plane components and their interactions

  • Ability to debug failing Pods, misconfigured networking, and resource issues

  • Knowledge of security best practices (RBAC, Network Policies, Pod Security)

  • Experience with rolling updates, rollbacks, and autoscaling

  • Troubleshooting methodology and systematic problem-solving

This guide covers the questions you are likely to face and the scenarios that test your Kubernetes problem-solving skills.


Part 1: Kubernetes Interview Questions

Foundational Questions

Q1: What is Kubernetes and why is it needed?

Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It was originally developed by Google and is now maintained by the Cloud Native Computing Foundation.

Kubernetes solves several problems that arise when running containers at scale:

  • Placement: Where to run each container

  • Scaling: How to add or remove containers based on demand

  • Health management: What to do when a container fails

  • Networking: How containers discover and communicate with each other

  • Storage: How data persists beyond container lifecycles

  • Rollouts: How to update applications without downtime

Without Kubernetes, teams manage these concerns manually or with custom scripts, leading to inconsistency and operational overhead.

Q2: Explain the architecture of a Kubernetes cluster.

A Kubernetes cluster has two main components: the Control Plane and Worker Nodes.

The Control Plane manages the cluster. It includes:

  • API Server: The front door for all administrative tasks. All internal and external communication goes through the API server.

  • etcd: A distributed key-value store that holds the entire cluster configuration and state.

  • Scheduler: Decides which worker node runs each new Pod based on resource requirements and affinity rules.

  • Controller Manager: Runs controllers that maintain desired state (Deployment controller, Node controller, Replication controller).

Worker Nodes run the applications. Each worker node contains:

  • Kubelet: The node agent that ensures containers are running in Pods.

  • Container Runtime: The software that runs containers (containerd, CRI-O).

  • Kube-proxy: Maintains network rules for service discovery and load balancing.

Q3: What is a Pod and why does Kubernetes use Pods instead of running containers directly?

A Pod is the smallest deployable unit in Kubernetes. It represents one or more containers that share the same network namespace, storage volumes, and lifecycle.

Kubernetes uses Pods instead of direct containers because containers often need to work together closely. A Pod allows:

  • Localhost communication: Containers in the same Pod can communicate over localhost, simplifying configuration.

  • Shared storage: Multiple containers can share volumes.

  • Same lifecycle: Sidecar containers (logging, monitoring, proxies) can be deployed and scaled with the main container.

Common Pod patterns include sidecar (helper container alongside main app), ambassador (proxy to external services), and adapter (transforms output for monitoring).

Q4: What is the difference between a Deployment and a StatefulSet?

Deployments are designed for stateless applications. All Pods are interchangeable, can be scaled up and down arbitrarily, and can be updated with rolling updates. Pods have random names and are not preserved when rescheduled.

StatefulSets are designed for stateful applications where each Pod has a unique identity. Pods have stable, predictable names (web-0, web-1, web-2). Pods are created and deleted in order. Pods retain their identity across restarts.

StatefulSets are used for databases (MySQL, PostgreSQL), message queues (Kafka, RabbitMQ), and any application where each instance has its own persistent storage.

Q5: How does a Service work and what are the different types?

A Service provides a stable network endpoint for a set of Pods. It load-balances traffic across healthy Pods and provides a stable IP address and DNS name.

The four Service types are:

  • ClusterIP: Default. Provides an internal IP address accessible only within the cluster. Used for internal communication between services.

  • NodePort: Exposes the Service on a static port on each node. Traffic to NodeIP:NodePort is forwarded to the Service. Used for basic external access or when a load balancer is not available.

  • LoadBalancer: Provisions a cloud load balancer (AWS ELB, GCP LB) that forwards traffic to the Service. This is the standard way to expose services to the internet in production.

  • ExternalName: Maps the Service to an external DNS name. Used to access external services using Kubernetes naming conventions.


Intermediate Questions

Q6: Explain the rolling update process.

A rolling update gradually replaces old Pods with new Pods to achieve zero downtime.

The update is controlled by two parameters:

  • maxSurge: How many extra Pods can be created during the update.

  • maxUnavailable: How many old Pods can be unavailable during the update.

For a Deployment with 4 replicas and strategy maxSurge: 1, maxUnavailable: 0:

  1. A new ReplicaSet is created with the updated image

  2. One new Pod is created and becomes ready

  3. One old Pod is terminated

  4. Steps 2-3 repeat until all Pods are updated

  5. The old ReplicaSet is scaled to zero

This ensures there are always at least 4 Pods serving traffic during the update.

Q7: How does the Horizontal Pod Autoscaler (HPA) work?

The HPA automatically scales the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed metrics (CPU, memory, or custom metrics).

The algorithm is:

text
desiredReplicas = ceil[currentReplicas * (currentMetric / desiredMetric)]

Example: If current replicas is 4, average CPU is 75%, and desired CPU is 50%, then ceil[4 * (75/50)] = ceil[4 * 1.5] = 6. HPA will scale to 6 replicas.

HPA fetches metrics from the Metrics Server (for CPU/memory) or Prometheus (for custom metrics). It scales up immediately when metrics exceed the target but scales down slowly (stabilization window) to avoid flapping.

Q8: What is a ConfigMap and how is it different from a Secret?

A ConfigMap stores non-sensitive configuration data in key-value pairs. It is used for environment-specific settings, feature flags, and application configuration.

A Secret stores sensitive information: passwords, API keys, TLS certificates. Secret values are base64 encoded (not encrypted by default). Secrets have more restrictive RBAC controls and can be encrypted at rest in etcd.

Both can be consumed as environment variables or mounted as volume files.

Q9: Explain how Ingress works.

Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. It provides advanced routing based on hostnames and URL paths.

Ingress is just a specification. An Ingress Controller (NGINX, AWS ALB, Traefik, GCE) implements it.

Example Ingress configuration:

  • example.com/api/* routes to api-service

  • example.com/* routes to web-service

  • admin.example.com/* routes to admin-service

Ingress also handles TLS termination, reducing the need for individual services to manage certificates.

Q10: What are the different ways to manage application configuration?

There are four primary ways to manage configuration in Kubernetes:

  1. ConfigMaps: Non-sensitive configuration as environment variables or files

  2. Secrets: Sensitive configuration, base64 encoded

  3. External configuration services: HashiCorp Vault, AWS Secrets Manager

  4. Environment variables: Hardcoded in Pod spec (least flexible)

The recommended approach is to use ConfigMaps and Secrets with volume mounts. This allows configuration updates without restarting Pods (eventually consistent).


Advanced Questions

Q11: How would you debug a Pod stuck in Pending state?

A Pod in Pending state means the scheduler cannot place it on a node. I would check:

bash
# Describe the Pod to see events
kubectl describe pod my-pod

# Common causes:
# - Insufficient CPU/memory resources (check node capacity)
# - Node selector or affinity rules preventing placement
# - PersistentVolumeClaim cannot be bound
# - Taints preventing scheduling on specific nodes

# Check node resources
kubectl top nodes
kubectl describe nodes

# Check PVC status
kubectl get pvc

# Check taints
kubectl describe nodes | grep Taints

Q12: A Pod is crashing repeatedly. How would you investigate?

bash
# Check Pod status
kubectl get pods

# Look for CrashLoopBackOff status

# Check logs
kubectl logs my-pod
kubectl logs my-pod --previous  # Logs from previous crashed instance

# Describe for events
kubectl describe pod my-pod

# Common causes:
# - Application error on startup
# - Missing environment variables or ConfigMaps
# - Incorrect command or arguments
# - OOM kill (exit code 137)
# - Liveness/readiness probe failing

# For OOM, check memory limits
kubectl describe pod my-pod | grep -A5 Limits

# For probe failures
kubectl describe pod my-pod | grep -A10 Liveness

Q13: How does network policy work?

Network Policy controls traffic flow between Pods and external endpoints. By default, all Pods can communicate with all other Pods. Network Policies restrict this.

A Network Policy defines:

  • podSelector: Which Pods the policy applies to

  • policyTypes: Ingress, Egress, or both

  • ingress: Allowed incoming traffic sources

  • egress: Allowed outgoing traffic destinations

Network Policies are implemented by CNI plugins (Calico, Cilium, Weave). Flannel does not support Network Policies.

Q14: How do you secure a Kubernetes cluster?

Kubernetes security has three primary layers:

RBAC (Who can do what)

  • Use namespaces to isolate resources

  • Assign least privilege roles to users and service accounts

  • Audit RBAC permissions regularly

  • Disable default service account auto-mounting

Network Policies (Traffic control)

  • Implement default deny all Network Policy

  • Only allow necessary ingress/egress traffic

  • Isolate production and development namespaces

Pod Security (How Pods run)

  • Enforce Pod Security Standards (Restricted for production)

  • Run containers as non-root user

  • Use read-only root filesystem

  • Drop all capabilities, add only needed ones

  • Set resource limits

Q15: What is etcd and why is it important?

etcd is a distributed key-value store that holds the entire configuration and state of the Kubernetes cluster. It stores:

  • Cluster state

  • Node information

  • Pod definitions

  • ConfigMaps and Secrets

  • Service definitions

  • Deployment states

etcd is the source of truth. If etcd fails, the cluster loses its state. Always back up etcd.


Part 2: Real-world Troubleshooting Scenarios

Scenario 1: Pod Stuck in ImagePullBackOff

Problem: A Pod cannot start because it cannot pull the container image.

Troubleshooting steps:

bash
# Check Pod status
kubectl get pods
# NAME      READY   STATUS             RESTARTS   AGE
# my-pod    0/1     ImagePullBackOff   0          5m

# Describe Pod for details
kubectl describe pod my-pod
# Events:
#   Failed to pull image "myapp:latest": rpc error: code = NotFound

# Check image name
kubectl get pod my-pod -o yaml | grep image

# Fix: Correct image name or ensure image exists in registry
# For private registry, create image pull secret
kubectl create secret docker-registry regcred \
  --docker-server=myregistry.io \
  --docker-username=user \
  --docker-password=pass

# Add to Pod spec
spec:
  imagePullSecrets:
  - name: regcred

Resolution: Correct image name, push image to registry, or add image pull secret for private registry.


Scenario 2: Service Not Accessible

Problem: A Service is created but cannot be accessed from another Pod.

Troubleshooting steps:

bash
# Check Service exists
kubectl get svc
# NAME         TYPE        CLUSTER-IP     PORT(S)
# my-service   ClusterIP   10.96.0.1      8080/TCP

# Check Endpoints (should have Pod IPs)
kubectl get endpoints my-service
# NAME         ENDPOINTS
# my-service   10.244.1.5:8080,10.244.2.3:8080

# If endpoints are empty, selector doesn't match any Pod
kubectl describe svc my-service | grep Selector

# Check Pod labels
kubectl get pods --show-labels

# Test connectivity from another Pod
kubectl run test --image=busybox -it --rm -- /bin/sh
wget -O- http://my-service:8080

# Check kube-proxy rules (on node)
iptables -L -n | grep my-service

Resolution: Fix Service selector to match Pod labels, or ensure Pods are running and have correct labels.


Scenario 3: Node Not Ready

Problem: A worker node is in NotReady state.

Troubleshooting steps:

bash
# Check node status
kubectl get nodes
# NAME         STATUS     ROLES    AGE
# worker-1     NotReady   <none>   10d

# Describe node for details
kubectl describe node worker-1
# Conditions:
#   Ready: Unknown
#   MemoryPressure: False
#   DiskPressure: False
#   PIDPressure: False

# Check node conditions for specific issues

# SSH into node
ssh worker-1

# Check kubelet status
systemctl status kubelet
journalctl -u kubelet -n 50

# Check Docker/containerd status
systemctl status containerd

# Check disk space
df -h

# Check node reachability
ping worker-1

Resolution: Restart kubelet, free disk space, or investigate node resource issues.


Scenario 4: HPA Not Scaling

Problem: Horizontal Pod Autoscaler is not scaling despite high CPU usage.

Troubleshooting steps:

bash
# Check HPA status
kubectl get hpa
# NAME       REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS
# web-hpa    Deployment/web      250%/50%  2         10        2

# Describe HPA for events
kubectl describe hpa web-hpa
# Events:
#   FailedGetResourceMetric: missing request for cpu

# Check if Metrics Server is installed
kubectl top pods
# error: Metrics API not available

# Metrics Server is missing!

# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify Metrics API
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq .

# Check Pod resource requests (required for HPA)
kubectl get deployment web -o yaml | grep -A5 resources

Resolution: Install Metrics Server and ensure Pods have resource requests defined.


Scenario 5: Persistent Volume Not Bound

Problem: PVC is stuck in Pending state.

Troubleshooting steps:

bash
# Check PVC status
kubectl get pvc
# NAME         STATUS    VOLUME   CAPACITY   ACCESS MODES
# data-pvc     Pending                                     5m

# Describe PVC for events
kubectl describe pvc data-pvc
# Events:
#   FailedBinding: no persistent volumes available for this claim

# Check available PVs
kubectl get pv

# If no PV, check Storage Class provisioner
kubectl get storageclass

# For dynamic provisioning, verify Storage Class has a provisioner
kubectl get storageclass standard -o yaml | grep provisioner

# For static provisioning, create a PV
apiVersion: v1
kind: PersistentVolume
metadata:
  name: manual-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /mnt/data

Resolution: Create matching PV or ensure Storage Class provisioner is correctly configured.


Part 3: Hands-on Practice Tasks

Task 1: Debug a Failing Deployment

Objective: A Deployment is failing. Identify and fix the issue.

Given YAML:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: broken-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: broken
  template:
    metadata:
      labels:
        app: broken
    spec:
      containers:
      - name: app
        image: nginx:latest
        command: ["/bin/sh"]
        args: ["-c", "exit 1"]

Steps to debug:

bash
# 1. Check Pod status
kubectl get pods
# broken-app-xxx   0/1     CrashLoopBackOff

# 2. Check logs
kubectl logs broken-app-xxx
# (no output - container exits immediately)

# 3. Check previous logs
kubectl logs broken-app-xxx --previous
# (no output)

# 4. Describe Pod for events
kubectl describe pod broken-app-xxx
# Events: Container exited with code 1

# 5. Issue: Command exits with error code
# Fix: Change command to run nginx properly

# Corrected container spec:
        image: nginx:latest
        # Remove command and args - use default CMD

Task 2: Configure Ingress for Two Services

Objective: Route example.com to web-service and api.example.com to api-service.

Solution:

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
spec:
  ingressClassName: nginx
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 8080

Task 3: Implement HPA with Custom Metrics

Objective: Configure HPA to scale based on HTTP requests per second using Prometheus.

Prerequisites: Prometheus and Prometheus Adapter installed.

Solution:

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

Task 4: Create Network Policy for Database

Objective: Database Pod should only accept traffic from the app Pod.

Solution:

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: database-policy
spec:
  podSelector:
    matchLabels:
      app: database
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: webapp
    ports:
    - protocol: TCP
      port: 5432
  policyTypes:
  - Ingress

Task 5: Rollback a Bad Deployment

Objective: A bad deployment was pushed. Roll back to the previous version.

Steps:

bash
# Check rollout history
kubectl rollout history deployment web-deployment
# REVISION  CHANGE-CAUSE
# 1         nginx:1.24
# 2         nginx:1.25 (bad)

# Check specific revision
kubectl rollout history deployment web-deployment --revision=2

# Rollback to previous revision
kubectl rollout undo deployment web-deployment

# Verify rollback
kubectl rollout status deployment web-deployment

# Rollback to specific revision (if needed)
kubectl rollout undo deployment web-deployment --to-revision=1

Kubernetes Interview Preparation Checklist

Fundamentals

  • Explain Kubernetes architecture (Control Plane, Worker Nodes)

  • Describe Pod, Deployment, Service, Ingress

  • Differentiate between Deployment and StatefulSet

  • Explain rolling update process

  • Understand ConfigMaps and Secrets

Networking

  • Explain Service types (ClusterIP, NodePort, LoadBalancer)

  • Describe Ingress and Ingress Controller

  • Configure Network Policies

  • Explain kube-proxy role

Storage

  • Differentiate PV, PVC, StorageClass

  • Explain static vs dynamic provisioning

  • Understand StatefulSet volumeClaimTemplates

Scaling & Updates

  • Configure HPA with CPU metrics

  • Understand HPA algorithm

  • Perform rolling updates and rollbacks

Security

  • Configure RBAC roles and bindings

  • Understand Pod Security Standards

  • Apply SecurityContext to Pods

Troubleshooting

  • Debug CrashLoopBackOff

  • Debug ImagePullBackOff

  • Debug Pending Pods

  • Debug Service connectivity

  • Use kubectl logs, describe, exec


Practice Exercises

  1. Deploy a simple web application with 3 replicas. Configure HPA to scale on CPU at 50%. Generate load and observe scaling.

  2. Create a ConfigMap with application properties. Mount it as a volume in a Pod. Update the ConfigMap and observe how long it takes for the Pod to see the change.

  3. Deploy a WordPress application with MySQL using StatefulSet and PersistentVolumeClaims. Delete the MySQL Pod and verify data persists.

  4. Create a Network Policy that allows traffic only from Pods with label app: frontend to a backend Service.

  5. Simulate a node failure (shutdown a worker node) and observe how Kubernetes reschedules Pods.


Summary

Interview TopicKey ConceptsCommands to Know
ArchitectureControl Plane, Worker Nodes, Podskubectl get componentstatuses
WorkloadsDeployment, StatefulSet, DaemonSetkubectl get deploy,sts,ds
NetworkingService, Ingress, NetworkPolicykubectl get svc,ing,netpol
StoragePV, PVC, StorageClasskubectl get pv,pvc,sc
ConfigurationConfigMap, Secretkubectl get cm,secret
ScalingHPA, VPAkubectl get hpa
Troubleshootinglogs, describe, execkubectl logs, describe, exec
SecurityRBAC, PodSecurity, NetworkPolicykubectl auth can-i

Learn More

Practice Kubernetes with hands-on exercises in our interactive labs:
https://devops.trainwithsky.com/

Comments

Popular posts from this blog

📊 Monitoring & Logging in Kubernetes – Tools like Prometheus, Grafana, and Fluentd

  Monitoring & Logging in Kubernetes – Tools like Prometheus, Grafana, and Fluentd Monitoring and logging are essential for maintaining a healthy and well-performing Kubernetes cluster. In this guide, we’ll cover why monitoring is important, key monitoring tools like Prometheus and Grafana, and logging tools like Fluentd to help you gain visibility into your cluster’s performance and logs. Shape Your Future with AI & Infinite Knowledge...!! Want to Generate Text-to-Voice, Images & Videos? http://www.ai.skyinfinitetech.com Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! 🚀 Introduction In today’s fast-paced cloud-native environment, Kubernetes has emerged as the de-facto container orchestration platform. But deploying and managing applications in Kubernetes is just half the ba...

How to Use SKY TTS: The Complete, Step-by-Step Guide for 2025

 What is SKY TTS? SKY TTS  is a free, next-generation  AI audio creation platform  that brings together high-quality  Text-to-Speech ,  Speech-to-Text , and a full suite of professional  audio editing tools  in one seamless experience. Our vision is simple — to make advanced audio technology  free, accessible, and effortless  for everyone. From creators and educators to podcasters, developers, and businesses, SKY TTS helps users produce  studio-grade voice content  without expensive software or technical skills. With support for  70+ languages, natural voices, audio enhancement, waveform generation, and batch automation , SKY TTS has become a trusted all-in-one toolkit for modern digital audio workflows. Why Choose SKY TTS? Instant Conversion:  Enjoy rapid text-to-speech generation, even with large documents. Advanced Voice Settings:   Adjust speed, pitch, and style for a personalized listening experience. Multi-...

Introduction to Terraform – The Future of Infrastructure as Code

  Introduction to Terraform – The Future of Infrastructure as Code In today’s fast-paced DevOps world, managing infrastructure manually is outdated . This is where Terraform comes in—a powerful Infrastructure as Code (IaC) tool that allows you to define, provision, and manage cloud infrastructure efficiently . Whether you're working with AWS, Azure, Google Cloud, or on-premises servers , Terraform provides a declarative, automation-first approach to infrastructure deployment. Shape Your Future with AI & Infinite Knowledge...!! Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! In today’s digital-first world, agility and automation are no longer optional—they’re essential. Companies across the globe are rapidly shifting their operations to the cloud to keep up with the pace of innovatio...