Debugging Kubernetes with Events and Logs
Use `kubectl get events` with timestamp sorting and container logs to quickly diagnose Kubernetes issues.
Use kubectl get events with timestamp sorting and container logs to quickly diagnose Kubernetes issues.
Why Events First?
Events are the first place to check when troubleshooting. They often contain direct clues about failures, scaling issues, or misconfigurations. Before reaching for describe or logs, run:
kubectl get events --sort-by=.metadata.creationTimestamp -o wide
The -o wide flag shows the namespace, involved object, and reason for the event, which is critical for context.
Actionable Workflow
-
Check Events:
- Run the command above.
- Filter by namespace if needed:
kubectl get events -n <namespace> --sort-by=.metadata.creationTimestamp. - Look for
Warningevents or repeated failures.
-
Inspect Logs with
--previous:- For crashed pods, get logs from the previous container instance:
kubectl logs <pod-name> --previous - Combine with
tailfor recent entries:kubectl logs <pod-name> --previous | tail -n 100
- For crashed pods, get logs from the previous container instance:
-
Describe as Last Resort:
- If events and logs don’t reveal the issue, use
describe:kubectl describe pod <pod-name> - Focus on
EventsandConditionssections.
- If events and logs don’t reveal the issue, use
Tooling
- kubectl: Master
get events,logs --previous, anddescribe. - k9s: TUI for browsing events and logs in real time.
- Lens: IDE with event filtering and log aggregation.
Tradeoffs and Caveats
- Events Are Rate-Limited: Kubernetes drops events under heavy load. Critical issues might not appear if the control plane is overwhelmed.
- Logs Are Transient: If a pod is deleted,
--previouslogs disappear. Always pair with centralized logging (e.g., Loki, ELK). - Timestamps Can Lie: Clock skew between nodes can misorder events. Use
--sort-bybut cross-check with other data.
Example Policy: Log Retention
Enforce a logging policy in your cluster:
# Example Fluentd config to forward logs to Loki
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluent.conf: |
<source>
@type systemd
tag kubernetes.*
</source>
<match kubernetes.**>
@type loki
url http://loki:3100/loki/api/v1/push
</match>
Retain logs for 30 days and alert on persistent crash loops.
Troubleshooting Common Issues
-
No Events Showing:
- Check if the
kube-apiserveris overloaded. - Verify event TTL (
--default-event-ttlin API server flags).
- Check if the
-
Pod Gone, Logs Missing:
- Without centralized logging, logs are lost. Push for a logging stack in such cases.
- Check node-level logs (e.g., journalctl) as a last resort.
-
Describe Overwhelm:
- Pipe output to
lessfor readability:kubectl describe pod <pod> | less
- Pipe output to
Prevention
- Monitor Event Streams: Use Prometheus to alert on event rates or specific warning types.
- Review Terminations: Regularly check
kubectl get eventsfor patterns in pod failures. - Train Teams: Teach engineers to default to events and logs before
describe. Reduces mean time to diagnose by ~50% in my experience.
Source thread: What K8s debugging trick would you have wished you knew on day one?

Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email