Learn Kubernetes troubleshooting the simple way: logs, events, metrics, kubectl describe, kubectl logs, kubectl exec, kubectl debug, and the practical workflow every DevOps engineer should know.

Kubernetes for DevOps Engineers: Logs, Events, Metrics, and Troubleshooting Made Simple

Once your workloads are deployed, connected, secured, and storing data correctly, the next question is:

What do you do when something still does not work?

This is where observability and troubleshooting become essential.

A solid DevOps engineer does not guess blindly. They follow signals, inspect the right layer, and narrow the problem down step by step.

In this guide, we will keep things simple and practical.

The Big Idea

Troubleshooting in Kubernetes usually starts with four kinds of signals:

Logs: what the application is saying
Events: what Kubernetes is saying
Metrics: what resource usage and performance look like
Object state: what Kubernetes believes should be happening

A simple mental model helps:

Logs explain symptoms, events explain cluster actions, metrics explain pressure, and object state explains intent.

Start With the Simplest Question

Before you run many commands, ask one thing:

Is this an application problem, a Kubernetes problem, or a networking/configuration problem?

That question saves a lot of wasted time.

Many Kubernetes incidents are not mysterious. They are usually one of these:

the Pod never started correctly
the container started but crashed
the Pod is running but not ready
the Service is pointing to the wrong Pods
the app is alive but resource-starved
the configuration or secret is wrong

kubectl describe: The First High-Value Command

One of the best first commands in Kubernetes troubleshooting is kubectl describe.

It gives you object details, status information, recent conditions, and related events.

kubectl describe pod demo-api-7d8b6c9f7d-abcde
kubectl describe deployment demo-api
kubectl describe service demo-api

If a Pod is Pending, CrashLoopBackOff, ImagePullBackOff, or failing readiness checks, describe often gives you the first real clue.

Logs: What the Application Is Saying

Logs are usually the next place to look.

If the application is starting, crashing, refusing connections, or failing to load configuration, the logs often show it directly.

Common Log Commands

kubectl logs pod/demo-api-7d8b6c9f7d-abcde
kubectl logs pod/demo-api-7d8b6c9f7d-abcde -c api
kubectl logs deployment/demo-api
kubectl logs pod/demo-api-7d8b6c9f7d-abcde --previous

The --previous flag is especially useful when a container is restarting and you want the logs from the last failed run.

Events: What Kubernetes Is Telling You

Events are short messages from Kubernetes about what happened to an object.

They often explain scheduling failures, image pull problems, failed mounts, readiness failures, and admission rejections.

Common Event Commands

kubectl events
kubectl events --for pod/demo-api-7d8b6c9f7d-abcde
kubectl events --types=Warning
kubectl get events -n staging --sort-by=.metadata.creationTimestamp

A simple rule:

If Kubernetes is blocking or restarting something, events often say why.

Metrics: Resource Pressure and Usage

Metrics answer a different kind of question:

Is this workload under CPU or memory pressure?

Metrics are useful when applications are slow, nodes are busy, or Pods are being killed because of resource limits.

The easiest built-in way to view quick CPU and memory usage is kubectl top.

kubectl top pod
kubectl top pod -n staging
kubectl top node

One important detail: kubectl top depends on Metrics Server. If Metrics Server is not installed, these commands will not work.

Logs Tell You What Happened. Metrics Tell You How Hard It Is Struggling.

This distinction matters.

An application log might show timeout errors. Metrics might show the Pod is out of memory or running hot on CPU.

Looking at only one of those signals can lead you to the wrong conclusion.

kubectl exec: Inspect a Running Container

Sometimes the Pod is running, but the app still is not behaving correctly.

In that case, you may need to look from inside the container.

kubectl exec -it pod/demo-api-7d8b6c9f7d-abcde -- sh
kubectl exec -it pod/demo-api-7d8b6c9f7d-abcde -c api -- sh

This is useful for checking files, environment variables, DNS resolution, network connectivity, mounted secrets, or whether the process is actually listening on the expected port.

kubectl debug: When Normal Inspection Is Not Enough

Sometimes a container image is too minimal to debug comfortably.

Maybe it has no shell, no curl, no dig, and no useful troubleshooting tools.

That is where kubectl debug becomes useful.

It helps you add or launch a debugging container so you can inspect the environment without rebuilding the application image first.

kubectl debug pod/demo-api-7d8b6c9f7d-abcde -it --image=busybox

This is one of the most practical modern Kubernetes debugging tools to know.

A Simple Troubleshooting Flow for Pods

When a Pod is not working, this is a good beginner-safe order:

kubectl get pods
kubectl describe pod demo-api-7d8b6c9f7d-abcde
kubectl logs pod/demo-api-7d8b6c9f7d-abcde
kubectl logs pod/demo-api-7d8b6c9f7d-abcde --previous
kubectl exec -it pod/demo-api-7d8b6c9f7d-abcde -- sh

Ask these questions in order:

Did the Pod get scheduled?
Did the image pull correctly?
Did the container start?
Did it crash?
Is readiness failing?
Is the application logging an obvious error?

A Simple Troubleshooting Flow for Services

If the Pod is healthy but traffic still fails, switch from Pod debugging to Service debugging.

kubectl get svc
kubectl describe svc demo-api
kubectl get endpointslices -l kubernetes.io/service-name=demo-api
kubectl get pods -l app=demo-api
kubectl exec -it pod/client-pod -- sh

Then check:

Does the Service selector match the right Pods?
Are the Pods Ready?
Are there endpoints for the Service?
Is the target port correct?
Can another Pod resolve and reach the Service name?

Where Traces Fit In

Kubernetes observability is often described through three pillars: logs, metrics, and traces.

For a beginner DevOps engineer, logs and metrics usually come first.

Traces become especially valuable when requests move across many services and you need to understand latency across the full path.

You do not need to master tracing on day one, but you should know where it fits.

Cluster-Level Logging Matters

Pod logs are useful, but they are not a complete logging strategy by themselves.

In real environments, logs should usually be collected into a centralized system so they do not disappear just because a Pod or node disappears.

A simple takeaway:

kubectl logs is great for immediate debugging. Centralized logging is what production operations depend on.

A Practical Troubleshooting Starter Pattern

For many common incidents, a good first-response pattern looks like this:

start with kubectl get to see what exists
use kubectl describe to inspect status and events
read logs from the current and previous container runs
check metrics if you suspect CPU or memory pressure
use exec or debug only when you need in-container investigation
verify selectors, endpoints, ports, and readiness before blaming networking

Common Beginner Mistakes

Starting With Random Guesses

Troubleshooting is much faster when you inspect facts in order instead of jumping between theories.

Reading Only Logs and Ignoring Events

The app logs may be silent while Kubernetes events clearly show a scheduling or mounting failure.

Using kubectl top Without Metrics Server

If Metrics Server is not installed, kubectl top cannot help you.

Skipping the Previous Logs on Restarting Containers

The most useful crash clue is often in the previous container logs, not the current one.

Assuming a Service Problem Is Always Networking

Many so-called networking issues are actually readiness, labels, ports, or endpoint problems.

Using Exec Too Early

Often, describe, logs, and events already tell you enough without entering the container.

What a DevOps Engineer Must Remember

kubectl describe is one of the best first troubleshooting commands.
Logs explain application behavior.
Events explain Kubernetes actions and failures.
Metrics explain resource pressure and recent usage.
kubectl top needs Metrics Server.
kubectl exec helps inspect a running container.
kubectl debug is useful when a normal container is too minimal to debug.
Good troubleshooting is a workflow, not a guess.

Final Thought

Kubernetes troubleshooting gets much easier when you stop asking vague questions like:

Why is this broken?

And start asking smaller questions like:

Is the Pod running? Is it ready? What do the events say? What do the logs say? Are the metrics normal?

Once you work in that order, most problems become much easier to isolate.