Kubernetes Interview & Scenarios

Kubernetes Interview & Scenarios: Real-world Troubleshooting, Interview Questions, and Hands-on Practice Tasks

📅 Published: May 2026
⏱️ Estimated Reading Time: 22 minutes
🏷️ Tags: Kubernetes Interview, K8s Troubleshooting, DevOps Interview, Container Orchestration

Introduction: What Kubernetes Interviewers Look For

Kubernetes interviews test more than your knowledge of YAML syntax. Interviewers want to see that you understand how Kubernetes works under the hood, how to troubleshoot problems, and how to design resilient systems.

The most valued Kubernetes skills in interviews are:

Understanding of control plane components and their interactions
Ability to debug failing Pods, misconfigured networking, and resource issues
Knowledge of security best practices (RBAC, Network Policies, Pod Security)
Experience with rolling updates, rollbacks, and autoscaling
Troubleshooting methodology and systematic problem-solving

This guide covers the questions you are likely to face and the scenarios that test your Kubernetes problem-solving skills.

Part 1: Kubernetes Interview Questions

Foundational Questions

Q1: What is Kubernetes and why is it needed?

Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It was originally developed by Google and is now maintained by the Cloud Native Computing Foundation.

Kubernetes solves several problems that arise when running containers at scale:

Placement: Where to run each container
Scaling: How to add or remove containers based on demand
Health management: What to do when a container fails
Networking: How containers discover and communicate with each other
Storage: How data persists beyond container lifecycles
Rollouts: How to update applications without downtime

Without Kubernetes, teams manage these concerns manually or with custom scripts, leading to inconsistency and operational overhead.

Q2: Explain the architecture of a Kubernetes cluster.

A Kubernetes cluster has two main components: the Control Plane and Worker Nodes.

The Control Plane manages the cluster. It includes:

API Server: The front door for all administrative tasks. All internal and external communication goes through the API server.
etcd: A distributed key-value store that holds the entire cluster configuration and state.
Scheduler: Decides which worker node runs each new Pod based on resource requirements and affinity rules.
Controller Manager: Runs controllers that maintain desired state (Deployment controller, Node controller, Replication controller).

Worker Nodes run the applications. Each worker node contains:

Kubelet: The node agent that ensures containers are running in Pods.
Container Runtime: The software that runs containers (containerd, CRI-O).
Kube-proxy: Maintains network rules for service discovery and load balancing.

Q3: What is a Pod and why does Kubernetes use Pods instead of running containers directly?

A Pod is the smallest deployable unit in Kubernetes. It represents one or more containers that share the same network namespace, storage volumes, and lifecycle.

Kubernetes uses Pods instead of direct containers because containers often need to work together closely. A Pod allows:

Localhost communication: Containers in the same Pod can communicate over localhost, simplifying configuration.
Shared storage: Multiple containers can share volumes.
Same lifecycle: Sidecar containers (logging, monitoring, proxies) can be deployed and scaled with the main container.

Common Pod patterns include sidecar (helper container alongside main app), ambassador (proxy to external services), and adapter (transforms output for monitoring).

Q4: What is the difference between a Deployment and a StatefulSet?

Deployments are designed for stateless applications. All Pods are interchangeable, can be scaled up and down arbitrarily, and can be updated with rolling updates. Pods have random names and are not preserved when rescheduled.

StatefulSets are designed for stateful applications where each Pod has a unique identity. Pods have stable, predictable names (web-0, web-1, web-2). Pods are created and deleted in order. Pods retain their identity across restarts.

StatefulSets are used for databases (MySQL, PostgreSQL), message queues (Kafka, RabbitMQ), and any application where each instance has its own persistent storage.

Q5: How does a Service work and what are the different types?

A Service provides a stable network endpoint for a set of Pods. It load-balances traffic across healthy Pods and provides a stable IP address and DNS name.

The four Service types are:

ClusterIP: Default. Provides an internal IP address accessible only within the cluster. Used for internal communication between services.
NodePort: Exposes the Service on a static port on each node. Traffic to NodeIP:NodePort is forwarded to the Service. Used for basic external access or when a load balancer is not available.
LoadBalancer: Provisions a cloud load balancer (AWS ELB, GCP LB) that forwards traffic to the Service. This is the standard way to expose services to the internet in production.
ExternalName: Maps the Service to an external DNS name. Used to access external services using Kubernetes naming conventions.

Intermediate Questions

Q6: Explain the rolling update process.

A rolling update gradually replaces old Pods with new Pods to achieve zero downtime.

The update is controlled by two parameters:

maxSurge: How many extra Pods can be created during the update.
maxUnavailable: How many old Pods can be unavailable during the update.

For a Deployment with 4 replicas and strategy maxSurge: 1, maxUnavailable: 0:

A new ReplicaSet is created with the updated image
One new Pod is created and becomes ready
One old Pod is terminated
Steps 2-3 repeat until all Pods are updated
The old ReplicaSet is scaled to zero

This ensures there are always at least 4 Pods serving traffic during the update.

Q7: How does the Horizontal Pod Autoscaler (HPA) work?

The HPA automatically scales the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed metrics (CPU, memory, or custom metrics).

The algorithm is:

desiredReplicas = ceil[currentReplicas * (currentMetric / desiredMetric)]

Example: If current replicas is 4, average CPU is 75%, and desired CPU is 50%, then ceil[4 * (75/50)] = ceil[4 * 1.5] = 6. HPA will scale to 6 replicas.

HPA fetches metrics from the Metrics Server (for CPU/memory) or Prometheus (for custom metrics). It scales up immediately when metrics exceed the target but scales down slowly (stabilization window) to avoid flapping.

Q8: What is a ConfigMap and how is it different from a Secret?

A ConfigMap stores non-sensitive configuration data in key-value pairs. It is used for environment-specific settings, feature flags, and application configuration.

A Secret stores sensitive information: passwords, API keys, TLS certificates. Secret values are base64 encoded (not encrypted by default). Secrets have more restrictive RBAC controls and can be encrypted at rest in etcd.

Both can be consumed as environment variables or mounted as volume files.

Q9: Explain how Ingress works.

Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. It provides advanced routing based on hostnames and URL paths.

Ingress is just a specification. An Ingress Controller (NGINX, AWS ALB, Traefik, GCE) implements it.

Example Ingress configuration:

example.com/api/* routes to api-service
example.com/* routes to web-service
admin.example.com/* routes to admin-service

Ingress also handles TLS termination, reducing the need for individual services to manage certificates.

Q10: What are the different ways to manage application configuration?

There are four primary ways to manage configuration in Kubernetes:

ConfigMaps: Non-sensitive configuration as environment variables or files
Secrets: Sensitive configuration, base64 encoded
External configuration services: HashiCorp Vault, AWS Secrets Manager
Environment variables: Hardcoded in Pod spec (least flexible)

The recommended approach is to use ConfigMaps and Secrets with volume mounts. This allows configuration updates without restarting Pods (eventually consistent).

Advanced Questions

Q11: How would you debug a Pod stuck in Pending state?

A Pod in Pending state means the scheduler cannot place it on a node. I would check:

# Describe the Pod to see events
kubectl describe pod my-pod

# Common causes:
# - Insufficient CPU/memory resources (check node capacity)
# - Node selector or affinity rules preventing placement
# - PersistentVolumeClaim cannot be bound
# - Taints preventing scheduling on specific nodes

# Check node resources
kubectl top nodes
kubectl describe nodes

# Check PVC status
kubectl get pvc

# Check taints
kubectl describe nodes | grep Taints

Q12: A Pod is crashing repeatedly. How would you investigate?

# Check Pod status
kubectl get pods

# Look for CrashLoopBackOff status

# Check logs
kubectl logs my-pod
kubectl logs my-pod --previous  # Logs from previous crashed instance

# Describe for events
kubectl describe pod my-pod

# Common causes:
# - Application error on startup
# - Missing environment variables or ConfigMaps
# - Incorrect command or arguments
# - OOM kill (exit code 137)
# - Liveness/readiness probe failing

# For OOM, check memory limits
kubectl describe pod my-pod | grep -A5 Limits

# For probe failures
kubectl describe pod my-pod | grep -A10 Liveness

Q13: How does network policy work?

Network Policy controls traffic flow between Pods and external endpoints. By default, all Pods can communicate with all other Pods. Network Policies restrict this.

A Network Policy defines:

podSelector: Which Pods the policy applies to
policyTypes: Ingress, Egress, or both
ingress: Allowed incoming traffic sources
egress: Allowed outgoing traffic destinations

Network Policies are implemented by CNI plugins (Calico, Cilium, Weave). Flannel does not support Network Policies.

Q14: How do you secure a Kubernetes cluster?

Kubernetes security has three primary layers:

RBAC (Who can do what)

Use namespaces to isolate resources
Assign least privilege roles to users and service accounts
Audit RBAC permissions regularly
Disable default service account auto-mounting

Network Policies (Traffic control)

Implement default deny all Network Policy
Only allow necessary ingress/egress traffic
Isolate production and development namespaces

Pod Security (How Pods run)

Enforce Pod Security Standards (Restricted for production)
Run containers as non-root user
Use read-only root filesystem
Drop all capabilities, add only needed ones
Set resource limits

Q15: What is etcd and why is it important?

etcd is a distributed key-value store that holds the entire configuration and state of the Kubernetes cluster. It stores:

Cluster state
Node information
Pod definitions
ConfigMaps and Secrets
Service definitions
Deployment states

etcd is the source of truth. If etcd fails, the cluster loses its state. Always back up etcd.

Part 2: Real-world Troubleshooting Scenarios

Scenario 1: Pod Stuck in ImagePullBackOff

Problem: A Pod cannot start because it cannot pull the container image.

Troubleshooting steps:

# Check Pod status
kubectl get pods
# NAME      READY   STATUS             RESTARTS   AGE
# my-pod    0/1     ImagePullBackOff   0          5m

# Describe Pod for details
kubectl describe pod my-pod
# Events:
#   Failed to pull image "myapp:latest": rpc error: code = NotFound

# Check image name
kubectl get pod my-pod -o yaml | grep image

# Fix: Correct image name or ensure image exists in registry
# For private registry, create image pull secret
kubectl create secret docker-registry regcred \
  --docker-server=myregistry.io \
  --docker-username=user \
  --docker-password=pass

# Add to Pod spec
spec:
  imagePullSecrets:
  - name: regcred

Resolution: Correct image name, push image to registry, or add image pull secret for private registry.

Scenario 2: Service Not Accessible

Problem: A Service is created but cannot be accessed from another Pod.

Troubleshooting steps:

# Check Service exists
kubectl get svc
# NAME         TYPE        CLUSTER-IP     PORT(S)
# my-service   ClusterIP   10.96.0.1      8080/TCP

# Check Endpoints (should have Pod IPs)
kubectl get endpoints my-service
# NAME         ENDPOINTS
# my-service   10.244.1.5:8080,10.244.2.3:8080

# If endpoints are empty, selector doesn't match any Pod
kubectl describe svc my-service | grep Selector

# Check Pod labels
kubectl get pods --show-labels

# Test connectivity from another Pod
kubectl run test --image=busybox -it --rm -- /bin/sh
wget -O- http://my-service:8080

# Check kube-proxy rules (on node)
iptables -L -n | grep my-service

Resolution: Fix Service selector to match Pod labels, or ensure Pods are running and have correct labels.

Scenario 3: Node Not Ready

Problem: A worker node is in NotReady state.

Troubleshooting steps:

# Check node status
kubectl get nodes
# NAME         STATUS     ROLES    AGE
# worker-1     NotReady   <none>   10d

# Describe node for details
kubectl describe node worker-1
# Conditions:
#   Ready: Unknown
#   MemoryPressure: False
#   DiskPressure: False
#   PIDPressure: False

# Check node conditions for specific issues

# SSH into node
ssh worker-1

# Check kubelet status
systemctl status kubelet
journalctl -u kubelet -n 50

# Check Docker/containerd status
systemctl status containerd

# Check disk space
df -h

# Check node reachability
ping worker-1

Resolution: Restart kubelet, free disk space, or investigate node resource issues.

Scenario 4: HPA Not Scaling

Problem: Horizontal Pod Autoscaler is not scaling despite high CPU usage.

Troubleshooting steps:

# Check HPA status
kubectl get hpa
# NAME       REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS
# web-hpa    Deployment/web      250%/50%  2         10        2

# Describe HPA for events
kubectl describe hpa web-hpa
# Events:
#   FailedGetResourceMetric: missing request for cpu

# Check if Metrics Server is installed
kubectl top pods
# error: Metrics API not available

# Metrics Server is missing!

# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify Metrics API
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq .

# Check Pod resource requests (required for HPA)
kubectl get deployment web -o yaml | grep -A5 resources

Resolution: Install Metrics Server and ensure Pods have resource requests defined.

Scenario 5: Persistent Volume Not Bound

Problem: PVC is stuck in Pending state.

Troubleshooting steps:

# Check PVC status
kubectl get pvc
# NAME         STATUS    VOLUME   CAPACITY   ACCESS MODES
# data-pvc     Pending                                     5m

# Describe PVC for events
kubectl describe pvc data-pvc
# Events:
#   FailedBinding: no persistent volumes available for this claim

# Check available PVs
kubectl get pv

# If no PV, check Storage Class provisioner
kubectl get storageclass

# For dynamic provisioning, verify Storage Class has a provisioner
kubectl get storageclass standard -o yaml | grep provisioner

# For static provisioning, create a PV
apiVersion: v1
kind: PersistentVolume
metadata:
  name: manual-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /mnt/data

Resolution: Create matching PV or ensure Storage Class provisioner is correctly configured.

Part 3: Hands-on Practice Tasks

Task 1: Debug a Failing Deployment

Objective: A Deployment is failing. Identify and fix the issue.

Given YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: broken-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: broken
  template:
    metadata:
      labels:
        app: broken
    spec:
      containers:
      - name: app
        image: nginx:latest
        command: ["/bin/sh"]
        args: ["-c", "exit 1"]

Steps to debug:

# 1. Check Pod status
kubectl get pods
# broken-app-xxx   0/1     CrashLoopBackOff

# 2. Check logs
kubectl logs broken-app-xxx
# (no output - container exits immediately)

# 3. Check previous logs
kubectl logs broken-app-xxx --previous
# (no output)

# 4. Describe Pod for events
kubectl describe pod broken-app-xxx
# Events: Container exited with code 1

# 5. Issue: Command exits with error code
# Fix: Change command to run nginx properly

# Corrected container spec:
        image: nginx:latest
        # Remove command and args - use default CMD

Task 2: Configure Ingress for Two Services

Objective: Route example.com to web-service and api.example.com to api-service.

Solution:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
spec:
  ingressClassName: nginx
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 8080

Task 3: Implement HPA with Custom Metrics

Objective: Configure HPA to scale based on HTTP requests per second using Prometheus.

Prerequisites: Prometheus and Prometheus Adapter installed.

Solution:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

Task 4: Create Network Policy for Database

Objective: Database Pod should only accept traffic from the app Pod.

Solution:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: database-policy
spec:
  podSelector:
    matchLabels:
      app: database
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: webapp
    ports:
    - protocol: TCP
      port: 5432
  policyTypes:
  - Ingress

Task 5: Rollback a Bad Deployment

Objective: A bad deployment was pushed. Roll back to the previous version.

Steps:

# Check rollout history
kubectl rollout history deployment web-deployment
# REVISION  CHANGE-CAUSE
# 1         nginx:1.24
# 2         nginx:1.25 (bad)

# Check specific revision
kubectl rollout history deployment web-deployment --revision=2

# Rollback to previous revision
kubectl rollout undo deployment web-deployment

# Verify rollback
kubectl rollout status deployment web-deployment

# Rollback to specific revision (if needed)
kubectl rollout undo deployment web-deployment --to-revision=1

Kubernetes Interview Preparation Checklist

Fundamentals

Explain Kubernetes architecture (Control Plane, Worker Nodes)
Describe Pod, Deployment, Service, Ingress
Differentiate between Deployment and StatefulSet
Explain rolling update process
Understand ConfigMaps and Secrets

Networking

Explain Service types (ClusterIP, NodePort, LoadBalancer)
Describe Ingress and Ingress Controller
Configure Network Policies
Explain kube-proxy role

Storage

Differentiate PV, PVC, StorageClass
Explain static vs dynamic provisioning
Understand StatefulSet volumeClaimTemplates

Scaling & Updates

Configure HPA with CPU metrics
Understand HPA algorithm
Perform rolling updates and rollbacks

Security

Configure RBAC roles and bindings
Understand Pod Security Standards
Apply SecurityContext to Pods

Troubleshooting

Debug CrashLoopBackOff
Debug ImagePullBackOff
Debug Pending Pods
Debug Service connectivity
Use kubectl logs, describe, exec

Practice Exercises

Deploy a simple web application with 3 replicas. Configure HPA to scale on CPU at 50%. Generate load and observe scaling.
Create a ConfigMap with application properties. Mount it as a volume in a Pod. Update the ConfigMap and observe how long it takes for the Pod to see the change.
Deploy a WordPress application with MySQL using StatefulSet and PersistentVolumeClaims. Delete the MySQL Pod and verify data persists.
Create a Network Policy that allows traffic only from Pods with label app: frontend to a backend Service.
Simulate a node failure (shutdown a worker node) and observe how Kubernetes reschedules Pods.

Summary

Interview Topic	Key Concepts	Commands to Know
Architecture	Control Plane, Worker Nodes, Pods	`kubectl get componentstatuses`
Workloads	Deployment, StatefulSet, DaemonSet	`kubectl get deploy,sts,ds`
Networking	Service, Ingress, NetworkPolicy	`kubectl get svc,ing,netpol`
Storage	PV, PVC, StorageClass	`kubectl get pv,pvc,sc`
Configuration	ConfigMap, Secret	`kubectl get cm,secret`
Scaling	HPA, VPA	`kubectl get hpa`
Troubleshooting	logs, describe, exec	`kubectl logs`, `describe`, `exec`
Security	RBAC, PodSecurity, NetworkPolicy	`kubectl auth can-i`

Learn More

Practice Kubernetes with hands-on exercises in our interactive labs:
https://devops.trainwithsky.com/

SKY Tech – Explore Technology!

Search This Blog