Understanding Kubernetes Controller Manager in Production
The Kubernetes controller manager maintains cluster state by reconciling desired and actual configurations through control loops.
The Kubernetes controller manager maintains cluster state by reconciling desired and actual configurations through control loops.
Core Functionality
The controller manager runs control loops that monitor cluster state and drive it toward the desired configuration. It manages:
- Node controllers (tracking node health)
- ReplicaSet controllers (managing pod replicas)
- ServiceAccount controllers (auto-creating accounts)
- Other built-in controllers for endpoints, namespaces, etc.
Each controller runs in a loop, checking actual state (via API server) against desired state (from manifests or APIs) and taking corrective action.
Diagnosing Controller Issues
If controllers fail, pods may not schedule, services break, or nodes hang in NotReady. Use this workflow:
-
Check controller status:
kubectl get componentstatuses # Legacy, may require kubelet enabled kubectl get --raw /readyz # Kubernetes 1.27+Look for controllers in “False” state.
-
Review logs:
kubectl logs -n kube-system kube-controller-manager-<node-name>Focus on errors like “failed to refresh cache” or “connection refused”.
-
Inspect events:
kubectl get events --sort-by=.metadata.creationTimestampFilter for controller-related events (e.g., “Error calling api”).
Prevention: Policy Example
Enforce health checks and resource limits in production:
Policy:
- Require all controllers to pass
--leader-elect=truefor high availability. - Set resource limits (e.g.,
--node-monitor-grace-period=15s). - Monitor with alerts for
kube_controller_statusmetrics.
Example command-line flag configuration:
--concurrent-controllers=20 \
--leader-elect=true \
--node-monitor-period=10s \
--node-monitor-grace-period=15s
Tooling
Use these tools to validate and troubleshoot:
-
Metrics:
kube_controller_status_ready(is the controller healthy?)kube_controller_runtime_posts(API server request latency)
-
Debugging:
kubectl describe pod kube-controller-manager(check events and config)kubectl get cs(simplified component status)
-
Recovery:
- Restart the controller manager process (via systemd or container restart).
- If etcd is unhealthy, fix connectivity or latency issues first.
Tradeoffs
Resource Usage vs Responsiveness:
- Increasing
--concurrent-controllersspeeds up reconciliation but raises memory usage. - Shorter
--node-monitor-periodcatches failures faster but increases API load.
Default settings are conservative; tune only after benchmarking.
Troubleshooting Common Failures
Symptom: Controller manager crashes repeatedly
- Check:
- Resource exhaustion (memory/cpu limits in
kube-controller-managerpod spec). - Misconfigured flags (e.g., invalid
--service-account-key-file).
- Resource exhaustion (memory/cpu limits in
- Fix:
- Increase resource limits.
- Validate configuration with
kube-apiserver --dry-runif custom flags are used.
Symptom: Controllers not reconciling changes
- Check:
- Leader election issues (
kubectl get eventsfor “lost leader lock” errors). - Network partitions between controller manager and API server.
- Leader election issues (
- Fix:
- Ensure
--leader-elect=trueis set. - Test API server connectivity from controller manager nodes.
- Ensure
Symptom: High API server load
- Check:
- Too many controllers running (
--concurrent-controllerstoo high). - Inefficient client libraries in custom controllers.
- Too many controllers running (
- Fix:
- Reduce concurrency or rate-limit requests.
- Use
--rate-limit-qpsand--burstflags to throttle.
Final Note
The controller manager is resilient but not faultless. Monitor its health as rigorously as the workloads it manages. When in doubt, start with logs and metrics—assumptions about “working” controllers often hide subtle configuration drift.
Source thread: How does the Kubernetes controller manager work?

Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email