Three Critical Signals After Kubernetes Deployments

Verify application health, resource utilization, and event logs immediately post-deploy to catch issues early.

JR

2 minute read

Verify application health, resource utilization, and event logs immediately post-deploy to catch issues early.

What to Check First

1. Application Health

Why: A broken deployment often manifests as failing pods or unreachable services.
How:

  • Run kubectl get pods -n <namespace> to check pod statuses. Look for CrashLoopBackoff or ImagePullBackoff.
  • Test endpoints: kubectl exec -it <pod> -- curl -v http://localhost:<port>.
  • Validate ingress/routing: Use curl or browser tests against external URLs.
    Caveat: Readiness/liveness probes might pass even if the app is misconfigured. Always test actual functionality.

2. Resource Utilization

Why: Deployments can spike CPU/memory usage, causing evictions or performance degradation.
How:

  • kubectl top pods -n <namespace> for real-time metrics.
  • Check historical data in Prometheus/Grafana if available.
  • Compare current usage to requests/limits: kubectl describe pod <pod> | grep Requests.
    Tradeoff: Over-monitoring resources can create alert fatigue. Focus on anomalies exceeding 80% of limits.

3. Event Logs

Why: Kubernetes events often reveal deployment misconfigurations or infrastructure issues.
How:

  • kubectl get events -n <namespace> --sort-by=.metadata.creationTimestamp.
  • Filter for Warning or Error types.
  • Cross-reference with pod logs: kubectl logs <pod> --tail=50.
    Caveat: Event logs can be noisy. Prioritize persistent errors over transient issues.

Actionable Workflow

  1. Immediate Post-Deploy:
    • Run kubectl get pods,svc,ingress -n <namespace> to confirm resources exist.
    • Check application health via synthetic requests.
    • Review resource metrics for spikes.
  2. First 5 Minutes:
    • Monitor event logs for recurring errors.
    • Validate traffic routing (e.g., blue/green switchovers).
  3. First 15 Minutes:
    • Compare error rates and latency against baseline metrics.

Policy Example

Monitoring Baseline:

  • Alert on:
    • Pod restarts > 3x in 5 minutes.
    • CPU > 80% for 10 minutes.
    • HTTP 5xx errors > 1% of requests.
  • Automate rollbacks via kubectl rollout undo deploy/<name> if critical alerts fire.

Tooling

  • CLI: kubectl, oc (OpenShift).
  • Metrics: Prometheus + Grafana, OpenShift Monitoring Stack.
  • Logs: Elasticsearch + Kibana, OpenShift Logging (EFK).
  • Synthetics: Use curl scripts or tools like Locust for load testing.

Troubleshooting Common Failures

  • Pod Not Ready:
    • Check readiness probe paths and timeouts.
    • Verify dependencies (e.g., databases) are reachable.
  • High Resource Usage:
    • Review HPA settings; ensure scaling is enabled.
    • Optimize container images for memory/CPU efficiency.
  • Event Log Noise:
    • Filter events by severity or component.
    • Use structured logging to separate debug/info messages.

Final Note

These signals are not a checklist but a starting point. Adapt based on your app’s failure modes—what kills your system isn’t always what the dashboard shows.

Source thread: What 3 signals do you check first after a Kubernetes deploy?

comments powered by Disqus