Kubernetes for DevOps Engineers: Logs, Events, Metrics, and Troubleshooting Made Simple
Learn Kubernetes troubleshooting the simple way: logs, events, metrics, kubectl describe, kubectl logs, kubectl exec, kubectl debug, and the practical workflow every DevOps engineer should know.

Kubernetes for DevOps Engineers: Logs, Events, Metrics, and Troubleshooting Made Simple
Once your workloads are deployed, connected, secured, and storing data correctly, the next question is:
What do you do when something still does not work?
This is where observability and troubleshooting become essential.
A solid DevOps engineer does not guess blindly. They follow signals, inspect the right layer, and narrow the problem down step by step.
In this guide, we will keep things simple and practical.
The Big Idea
Troubleshooting in Kubernetes usually starts with four kinds of signals:
- Logs: what the application is saying
- Events: what Kubernetes is saying
- Metrics: what resource usage and performance look like
- Object state: what Kubernetes believes should be happening
A simple mental model helps:
Logs explain symptoms, events explain cluster actions, metrics explain pressure, and object state explains intent.
Start With the Simplest Question
Before you run many commands, ask one thing:
Is this an application problem, a Kubernetes problem, or a networking/configuration problem?
That question saves a lot of wasted time.
Many Kubernetes incidents are not mysterious. They are usually one of these:
- the Pod never started correctly
- the container started but crashed
- the Pod is running but not ready
- the Service is pointing to the wrong Pods
- the app is alive but resource-starved
- the configuration or secret is wrong
kubectl describe: The First High-Value Command
One of the best first commands in Kubernetes troubleshooting is kubectl describe.
It gives you object details, status information, recent conditions, and related events.
kubectl describe pod demo-api-7d8b6c9f7d-abcde
kubectl describe deployment demo-api
kubectl describe service demo-api
If a Pod is Pending, CrashLoopBackOff, ImagePullBackOff, or failing readiness checks, describe often gives you the first real clue.
Logs: What the Application Is Saying
Logs are usually the next place to look.
If the application is starting, crashing, refusing connections, or failing to load configuration, the logs often show it directly.
Common Log Commands
kubectl logs pod/demo-api-7d8b6c9f7d-abcde
kubectl logs pod/demo-api-7d8b6c9f7d-abcde -c api
kubectl logs deployment/demo-api
kubectl logs pod/demo-api-7d8b6c9f7d-abcde --previous
The --previous flag is especially useful when a container is restarting and you want the logs from the last failed run.
Events: What Kubernetes Is Telling You
Events are short messages from Kubernetes about what happened to an object.
They often explain scheduling failures, image pull problems, failed mounts, readiness failures, and admission rejections.
Common Event Commands
kubectl events
kubectl events --for pod/demo-api-7d8b6c9f7d-abcde
kubectl events --types=Warning
kubectl get events -n staging --sort-by=.metadata.creationTimestamp
A simple rule:
If Kubernetes is blocking or restarting something, events often say why.
Metrics: Resource Pressure and Usage
Metrics answer a different kind of question:
Is this workload under CPU or memory pressure?
Metrics are useful when applications are slow, nodes are busy, or Pods are being killed because of resource limits.
The easiest built-in way to view quick CPU and memory usage is kubectl top.
kubectl top pod
kubectl top pod -n staging
kubectl top node
One important detail: kubectl top depends on Metrics Server.
If Metrics Server is not installed, these commands will not work.
Logs Tell You What Happened. Metrics Tell You How Hard It Is Struggling.
This distinction matters.
An application log might show timeout errors. Metrics might show the Pod is out of memory or running hot on CPU.
Looking at only one of those signals can lead you to the wrong conclusion.
kubectl exec: Inspect a Running Container
Sometimes the Pod is running, but the app still is not behaving correctly.
In that case, you may need to look from inside the container.
kubectl exec -it pod/demo-api-7d8b6c9f7d-abcde -- sh
kubectl exec -it pod/demo-api-7d8b6c9f7d-abcde -c api -- sh
This is useful for checking files, environment variables, DNS resolution, network connectivity, mounted secrets, or whether the process is actually listening on the expected port.
kubectl debug: When Normal Inspection Is Not Enough
Sometimes a container image is too minimal to debug comfortably.
Maybe it has no shell, no curl, no dig, and no useful troubleshooting tools.
That is where kubectl debug becomes useful.
It helps you add or launch a debugging container so you can inspect the environment without rebuilding the application image first.
kubectl debug pod/demo-api-7d8b6c9f7d-abcde -it --image=busybox
This is one of the most practical modern Kubernetes debugging tools to know.
A Simple Troubleshooting Flow for Pods
When a Pod is not working, this is a good beginner-safe order:
kubectl get pods
kubectl describe pod demo-api-7d8b6c9f7d-abcde
kubectl logs pod/demo-api-7d8b6c9f7d-abcde
kubectl logs pod/demo-api-7d8b6c9f7d-abcde --previous
kubectl exec -it pod/demo-api-7d8b6c9f7d-abcde -- sh
Ask these questions in order:
- Did the Pod get scheduled?
- Did the image pull correctly?
- Did the container start?
- Did it crash?
- Is readiness failing?
- Is the application logging an obvious error?
A Simple Troubleshooting Flow for Services
If the Pod is healthy but traffic still fails, switch from Pod debugging to Service debugging.
kubectl get svc
kubectl describe svc demo-api
kubectl get endpointslices -l kubernetes.io/service-name=demo-api
kubectl get pods -l app=demo-api
kubectl exec -it pod/client-pod -- sh
Then check:
- Does the Service selector match the right Pods?
- Are the Pods Ready?
- Are there endpoints for the Service?
- Is the target port correct?
- Can another Pod resolve and reach the Service name?
Where Traces Fit In
Kubernetes observability is often described through three pillars: logs, metrics, and traces.
For a beginner DevOps engineer, logs and metrics usually come first.
Traces become especially valuable when requests move across many services and you need to understand latency across the full path.
You do not need to master tracing on day one, but you should know where it fits.
Cluster-Level Logging Matters
Pod logs are useful, but they are not a complete logging strategy by themselves.
In real environments, logs should usually be collected into a centralized system so they do not disappear just because a Pod or node disappears.
A simple takeaway:
kubectl logs is great for immediate debugging. Centralized logging is what production operations depend on.
A Practical Troubleshooting Starter Pattern
For many common incidents, a good first-response pattern looks like this:
- start with
kubectl getto see what exists - use
kubectl describeto inspect status and events - read logs from the current and previous container runs
- check metrics if you suspect CPU or memory pressure
- use
execordebugonly when you need in-container investigation - verify selectors, endpoints, ports, and readiness before blaming networking
Common Beginner Mistakes
Starting With Random Guesses
Troubleshooting is much faster when you inspect facts in order instead of jumping between theories.
Reading Only Logs and Ignoring Events
The app logs may be silent while Kubernetes events clearly show a scheduling or mounting failure.
Using kubectl top Without Metrics Server
If Metrics Server is not installed, kubectl top cannot help you.
Skipping the Previous Logs on Restarting Containers
The most useful crash clue is often in the previous container logs, not the current one.
Assuming a Service Problem Is Always Networking
Many so-called networking issues are actually readiness, labels, ports, or endpoint problems.
Using Exec Too Early
Often, describe, logs, and events already tell you enough without entering the container.
What a DevOps Engineer Must Remember
kubectl describeis one of the best first troubleshooting commands.- Logs explain application behavior.
- Events explain Kubernetes actions and failures.
- Metrics explain resource pressure and recent usage.
kubectl topneeds Metrics Server.kubectl exechelps inspect a running container.kubectl debugis useful when a normal container is too minimal to debug.- Good troubleshooting is a workflow, not a guess.
Final Thought
Kubernetes troubleshooting gets much easier when you stop asking vague questions like:
Why is this broken?
And start asking smaller questions like:
Is the Pod running? Is it ready? What do the events say? What do the logs say? Are the metrics normal?
Once you work in that order, most problems become much easier to isolate.