Kubernetes Interview & Scenarios: Real-world Troubleshooting, Interview Questions, and Hands-on Practice Tasks
📅 Published: May 2026
⏱️ Estimated Reading Time: 22 minutes
🏷️ Tags: Kubernetes Interview, K8s Troubleshooting, DevOps Interview, Container Orchestration
Introduction: What Kubernetes Interviewers Look For
Kubernetes interviews test more than your knowledge of YAML syntax. Interviewers want to see that you understand how Kubernetes works under the hood, how to troubleshoot problems, and how to design resilient systems.
The most valued Kubernetes skills in interviews are:
Understanding of control plane components and their interactions
Ability to debug failing Pods, misconfigured networking, and resource issues
Knowledge of security best practices (RBAC, Network Policies, Pod Security)
Experience with rolling updates, rollbacks, and autoscaling
Troubleshooting methodology and systematic problem-solving
This guide covers the questions you are likely to face and the scenarios that test your Kubernetes problem-solving skills.
Part 1: Kubernetes Interview Questions
Foundational Questions
Q1: What is Kubernetes and why is it needed?
Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It was originally developed by Google and is now maintained by the Cloud Native Computing Foundation.
Kubernetes solves several problems that arise when running containers at scale:
Placement: Where to run each container
Scaling: How to add or remove containers based on demand
Health management: What to do when a container fails
Networking: How containers discover and communicate with each other
Storage: How data persists beyond container lifecycles
Rollouts: How to update applications without downtime
Without Kubernetes, teams manage these concerns manually or with custom scripts, leading to inconsistency and operational overhead.
Q2: Explain the architecture of a Kubernetes cluster.
A Kubernetes cluster has two main components: the Control Plane and Worker Nodes.
The Control Plane manages the cluster. It includes:
API Server: The front door for all administrative tasks. All internal and external communication goes through the API server.
etcd: A distributed key-value store that holds the entire cluster configuration and state.
Scheduler: Decides which worker node runs each new Pod based on resource requirements and affinity rules.
Controller Manager: Runs controllers that maintain desired state (Deployment controller, Node controller, Replication controller).
Worker Nodes run the applications. Each worker node contains:
Kubelet: The node agent that ensures containers are running in Pods.
Container Runtime: The software that runs containers (containerd, CRI-O).
Kube-proxy: Maintains network rules for service discovery and load balancing.
Q3: What is a Pod and why does Kubernetes use Pods instead of running containers directly?
A Pod is the smallest deployable unit in Kubernetes. It represents one or more containers that share the same network namespace, storage volumes, and lifecycle.
Kubernetes uses Pods instead of direct containers because containers often need to work together closely. A Pod allows:
Localhost communication: Containers in the same Pod can communicate over localhost, simplifying configuration.
Shared storage: Multiple containers can share volumes.
Same lifecycle: Sidecar containers (logging, monitoring, proxies) can be deployed and scaled with the main container.
Common Pod patterns include sidecar (helper container alongside main app), ambassador (proxy to external services), and adapter (transforms output for monitoring).
Q4: What is the difference between a Deployment and a StatefulSet?
Deployments are designed for stateless applications. All Pods are interchangeable, can be scaled up and down arbitrarily, and can be updated with rolling updates. Pods have random names and are not preserved when rescheduled.
StatefulSets are designed for stateful applications where each Pod has a unique identity. Pods have stable, predictable names (web-0, web-1, web-2). Pods are created and deleted in order. Pods retain their identity across restarts.
StatefulSets are used for databases (MySQL, PostgreSQL), message queues (Kafka, RabbitMQ), and any application where each instance has its own persistent storage.
Q5: How does a Service work and what are the different types?
A Service provides a stable network endpoint for a set of Pods. It load-balances traffic across healthy Pods and provides a stable IP address and DNS name.
The four Service types are:
ClusterIP: Default. Provides an internal IP address accessible only within the cluster. Used for internal communication between services.
NodePort: Exposes the Service on a static port on each node. Traffic to NodeIP:NodePort is forwarded to the Service. Used for basic external access or when a load balancer is not available.
LoadBalancer: Provisions a cloud load balancer (AWS ELB, GCP LB) that forwards traffic to the Service. This is the standard way to expose services to the internet in production.
ExternalName: Maps the Service to an external DNS name. Used to access external services using Kubernetes naming conventions.
Intermediate Questions
Q6: Explain the rolling update process.
A rolling update gradually replaces old Pods with new Pods to achieve zero downtime.
The update is controlled by two parameters:
maxSurge: How many extra Pods can be created during the update.
maxUnavailable: How many old Pods can be unavailable during the update.
For a Deployment with 4 replicas and strategy maxSurge: 1, maxUnavailable: 0:
A new ReplicaSet is created with the updated image
One new Pod is created and becomes ready
One old Pod is terminated
Steps 2-3 repeat until all Pods are updated
The old ReplicaSet is scaled to zero
This ensures there are always at least 4 Pods serving traffic during the update.
Q7: How does the Horizontal Pod Autoscaler (HPA) work?
The HPA automatically scales the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed metrics (CPU, memory, or custom metrics).
The algorithm is:
desiredReplicas = ceil[currentReplicas * (currentMetric / desiredMetric)]
Example: If current replicas is 4, average CPU is 75%, and desired CPU is 50%, then ceil[4 * (75/50)] = ceil[4 * 1.5] = 6. HPA will scale to 6 replicas.
HPA fetches metrics from the Metrics Server (for CPU/memory) or Prometheus (for custom metrics). It scales up immediately when metrics exceed the target but scales down slowly (stabilization window) to avoid flapping.
Q8: What is a ConfigMap and how is it different from a Secret?
A ConfigMap stores non-sensitive configuration data in key-value pairs. It is used for environment-specific settings, feature flags, and application configuration.
A Secret stores sensitive information: passwords, API keys, TLS certificates. Secret values are base64 encoded (not encrypted by default). Secrets have more restrictive RBAC controls and can be encrypted at rest in etcd.
Both can be consumed as environment variables or mounted as volume files.
Q9: Explain how Ingress works.
Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. It provides advanced routing based on hostnames and URL paths.
Ingress is just a specification. An Ingress Controller (NGINX, AWS ALB, Traefik, GCE) implements it.
Example Ingress configuration:
example.com/api/*routes to api-serviceexample.com/*routes to web-serviceadmin.example.com/*routes to admin-service
Ingress also handles TLS termination, reducing the need for individual services to manage certificates.
Q10: What are the different ways to manage application configuration?
There are four primary ways to manage configuration in Kubernetes:
ConfigMaps: Non-sensitive configuration as environment variables or files
Secrets: Sensitive configuration, base64 encoded
External configuration services: HashiCorp Vault, AWS Secrets Manager
Environment variables: Hardcoded in Pod spec (least flexible)
The recommended approach is to use ConfigMaps and Secrets with volume mounts. This allows configuration updates without restarting Pods (eventually consistent).
Advanced Questions
Q11: How would you debug a Pod stuck in Pending state?
A Pod in Pending state means the scheduler cannot place it on a node. I would check:
# Describe the Pod to see events kubectl describe pod my-pod # Common causes: # - Insufficient CPU/memory resources (check node capacity) # - Node selector or affinity rules preventing placement # - PersistentVolumeClaim cannot be bound # - Taints preventing scheduling on specific nodes # Check node resources kubectl top nodes kubectl describe nodes # Check PVC status kubectl get pvc # Check taints kubectl describe nodes | grep Taints
Q12: A Pod is crashing repeatedly. How would you investigate?
# Check Pod status kubectl get pods # Look for CrashLoopBackOff status # Check logs kubectl logs my-pod kubectl logs my-pod --previous # Logs from previous crashed instance # Describe for events kubectl describe pod my-pod # Common causes: # - Application error on startup # - Missing environment variables or ConfigMaps # - Incorrect command or arguments # - OOM kill (exit code 137) # - Liveness/readiness probe failing # For OOM, check memory limits kubectl describe pod my-pod | grep -A5 Limits # For probe failures kubectl describe pod my-pod | grep -A10 Liveness
Q13: How does network policy work?
Network Policy controls traffic flow between Pods and external endpoints. By default, all Pods can communicate with all other Pods. Network Policies restrict this.
A Network Policy defines:
podSelector: Which Pods the policy applies to
policyTypes: Ingress, Egress, or both
ingress: Allowed incoming traffic sources
egress: Allowed outgoing traffic destinations
Network Policies are implemented by CNI plugins (Calico, Cilium, Weave). Flannel does not support Network Policies.
Q14: How do you secure a Kubernetes cluster?
Kubernetes security has three primary layers:
RBAC (Who can do what)
Use namespaces to isolate resources
Assign least privilege roles to users and service accounts
Audit RBAC permissions regularly
Disable default service account auto-mounting
Network Policies (Traffic control)
Implement default deny all Network Policy
Only allow necessary ingress/egress traffic
Isolate production and development namespaces
Pod Security (How Pods run)
Enforce Pod Security Standards (Restricted for production)
Run containers as non-root user
Use read-only root filesystem
Drop all capabilities, add only needed ones
Set resource limits
Q15: What is etcd and why is it important?
etcd is a distributed key-value store that holds the entire configuration and state of the Kubernetes cluster. It stores:
Cluster state
Node information
Pod definitions
ConfigMaps and Secrets
Service definitions
Deployment states
etcd is the source of truth. If etcd fails, the cluster loses its state. Always back up etcd.
Part 2: Real-world Troubleshooting Scenarios
Scenario 1: Pod Stuck in ImagePullBackOff
Problem: A Pod cannot start because it cannot pull the container image.
Troubleshooting steps:
# Check Pod status kubectl get pods # NAME READY STATUS RESTARTS AGE # my-pod 0/1 ImagePullBackOff 0 5m # Describe Pod for details kubectl describe pod my-pod # Events: # Failed to pull image "myapp:latest": rpc error: code = NotFound # Check image name kubectl get pod my-pod -o yaml | grep image # Fix: Correct image name or ensure image exists in registry # For private registry, create image pull secret kubectl create secret docker-registry regcred \ --docker-server=myregistry.io \ --docker-username=user \ --docker-password=pass # Add to Pod spec spec: imagePullSecrets: - name: regcred
Resolution: Correct image name, push image to registry, or add image pull secret for private registry.
Scenario 2: Service Not Accessible
Problem: A Service is created but cannot be accessed from another Pod.
Troubleshooting steps:
# Check Service exists kubectl get svc # NAME TYPE CLUSTER-IP PORT(S) # my-service ClusterIP 10.96.0.1 8080/TCP # Check Endpoints (should have Pod IPs) kubectl get endpoints my-service # NAME ENDPOINTS # my-service 10.244.1.5:8080,10.244.2.3:8080 # If endpoints are empty, selector doesn't match any Pod kubectl describe svc my-service | grep Selector # Check Pod labels kubectl get pods --show-labels # Test connectivity from another Pod kubectl run test --image=busybox -it --rm -- /bin/sh wget -O- http://my-service:8080 # Check kube-proxy rules (on node) iptables -L -n | grep my-service
Resolution: Fix Service selector to match Pod labels, or ensure Pods are running and have correct labels.
Scenario 3: Node Not Ready
Problem: A worker node is in NotReady state.
Troubleshooting steps:
# Check node status kubectl get nodes # NAME STATUS ROLES AGE # worker-1 NotReady <none> 10d # Describe node for details kubectl describe node worker-1 # Conditions: # Ready: Unknown # MemoryPressure: False # DiskPressure: False # PIDPressure: False # Check node conditions for specific issues # SSH into node ssh worker-1 # Check kubelet status systemctl status kubelet journalctl -u kubelet -n 50 # Check Docker/containerd status systemctl status containerd # Check disk space df -h # Check node reachability ping worker-1
Resolution: Restart kubelet, free disk space, or investigate node resource issues.
Scenario 4: HPA Not Scaling
Problem: Horizontal Pod Autoscaler is not scaling despite high CPU usage.
Troubleshooting steps:
# Check HPA status kubectl get hpa # NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS # web-hpa Deployment/web 250%/50% 2 10 2 # Describe HPA for events kubectl describe hpa web-hpa # Events: # FailedGetResourceMetric: missing request for cpu # Check if Metrics Server is installed kubectl top pods # error: Metrics API not available # Metrics Server is missing! # Install Metrics Server kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml # Verify Metrics API kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq . # Check Pod resource requests (required for HPA) kubectl get deployment web -o yaml | grep -A5 resources
Resolution: Install Metrics Server and ensure Pods have resource requests defined.
Scenario 5: Persistent Volume Not Bound
Problem: PVC is stuck in Pending state.
Troubleshooting steps:
# Check PVC status kubectl get pvc # NAME STATUS VOLUME CAPACITY ACCESS MODES # data-pvc Pending 5m # Describe PVC for events kubectl describe pvc data-pvc # Events: # FailedBinding: no persistent volumes available for this claim # Check available PVs kubectl get pv # If no PV, check Storage Class provisioner kubectl get storageclass # For dynamic provisioning, verify Storage Class has a provisioner kubectl get storageclass standard -o yaml | grep provisioner # For static provisioning, create a PV apiVersion: v1 kind: PersistentVolume metadata: name: manual-pv spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce hostPath: path: /mnt/data
Resolution: Create matching PV or ensure Storage Class provisioner is correctly configured.
Part 3: Hands-on Practice Tasks
Task 1: Debug a Failing Deployment
Objective: A Deployment is failing. Identify and fix the issue.
Given YAML:
apiVersion: apps/v1 kind: Deployment metadata: name: broken-app spec: replicas: 2 selector: matchLabels: app: broken template: metadata: labels: app: broken spec: containers: - name: app image: nginx:latest command: ["/bin/sh"] args: ["-c", "exit 1"]
Steps to debug:
# 1. Check Pod status kubectl get pods # broken-app-xxx 0/1 CrashLoopBackOff # 2. Check logs kubectl logs broken-app-xxx # (no output - container exits immediately) # 3. Check previous logs kubectl logs broken-app-xxx --previous # (no output) # 4. Describe Pod for events kubectl describe pod broken-app-xxx # Events: Container exited with code 1 # 5. Issue: Command exits with error code # Fix: Change command to run nginx properly # Corrected container spec: image: nginx:latest # Remove command and args - use default CMD
Task 2: Configure Ingress for Two Services
Objective: Route example.com to web-service and api.example.com to api-service.
Solution:
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-ingress spec: ingressClassName: nginx rules: - host: example.com http: paths: - path: / pathType: Prefix backend: service: name: web-service port: number: 80 - host: api.example.com http: paths: - path: / pathType: Prefix backend: service: name: api-service port: number: 8080
Task 3: Implement HPA with Custom Metrics
Objective: Configure HPA to scale based on HTTP requests per second using Prometheus.
Prerequisites: Prometheus and Prometheus Adapter installed.
Solution:
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: web-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000"
Task 4: Create Network Policy for Database
Objective: Database Pod should only accept traffic from the app Pod.
Solution:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: database-policy spec: podSelector: matchLabels: app: database ingress: - from: - podSelector: matchLabels: app: webapp ports: - protocol: TCP port: 5432 policyTypes: - Ingress
Task 5: Rollback a Bad Deployment
Objective: A bad deployment was pushed. Roll back to the previous version.
Steps:
# Check rollout history kubectl rollout history deployment web-deployment # REVISION CHANGE-CAUSE # 1 nginx:1.24 # 2 nginx:1.25 (bad) # Check specific revision kubectl rollout history deployment web-deployment --revision=2 # Rollback to previous revision kubectl rollout undo deployment web-deployment # Verify rollback kubectl rollout status deployment web-deployment # Rollback to specific revision (if needed) kubectl rollout undo deployment web-deployment --to-revision=1
Kubernetes Interview Preparation Checklist
Fundamentals
Explain Kubernetes architecture (Control Plane, Worker Nodes)
Describe Pod, Deployment, Service, Ingress
Differentiate between Deployment and StatefulSet
Explain rolling update process
Understand ConfigMaps and Secrets
Networking
Explain Service types (ClusterIP, NodePort, LoadBalancer)
Describe Ingress and Ingress Controller
Configure Network Policies
Explain kube-proxy role
Storage
Differentiate PV, PVC, StorageClass
Explain static vs dynamic provisioning
Understand StatefulSet volumeClaimTemplates
Scaling & Updates
Configure HPA with CPU metrics
Understand HPA algorithm
Perform rolling updates and rollbacks
Security
Configure RBAC roles and bindings
Understand Pod Security Standards
Apply SecurityContext to Pods
Troubleshooting
Debug CrashLoopBackOff
Debug ImagePullBackOff
Debug Pending Pods
Debug Service connectivity
Use kubectl logs, describe, exec
Practice Exercises
Deploy a simple web application with 3 replicas. Configure HPA to scale on CPU at 50%. Generate load and observe scaling.
Create a ConfigMap with application properties. Mount it as a volume in a Pod. Update the ConfigMap and observe how long it takes for the Pod to see the change.
Deploy a WordPress application with MySQL using StatefulSet and PersistentVolumeClaims. Delete the MySQL Pod and verify data persists.
Create a Network Policy that allows traffic only from Pods with label
app: frontendto a backend Service.Simulate a node failure (shutdown a worker node) and observe how Kubernetes reschedules Pods.
Summary
| Interview Topic | Key Concepts | Commands to Know |
|---|---|---|
| Architecture | Control Plane, Worker Nodes, Pods | kubectl get componentstatuses |
| Workloads | Deployment, StatefulSet, DaemonSet | kubectl get deploy,sts,ds |
| Networking | Service, Ingress, NetworkPolicy | kubectl get svc,ing,netpol |
| Storage | PV, PVC, StorageClass | kubectl get pv,pvc,sc |
| Configuration | ConfigMap, Secret | kubectl get cm,secret |
| Scaling | HPA, VPA | kubectl get hpa |
| Troubleshooting | logs, describe, exec | kubectl logs, describe, exec |
| Security | RBAC, PodSecurity, NetworkPolicy | kubectl auth can-i |
Learn More
Practice Kubernetes with hands-on exercises in our interactive labs:
https://devops.trainwithsky.com/
Comments
Post a Comment