Understanding Kubernetes Controller Manager in Production

The Kubernetes controller manager maintains cluster state by reconciling desired and actual configurations through control loops.

May 19, 2026 JR

3 minute read

The Kubernetes controller manager maintains cluster state by reconciling desired and actual configurations through control loops.

Core Functionality

The controller manager runs control loops that monitor cluster state and drive it toward the desired configuration. It manages:

Node controllers (tracking node health)
ReplicaSet controllers (managing pod replicas)
ServiceAccount controllers (auto-creating accounts)
Other built-in controllers for endpoints, namespaces, etc.

Each controller runs in a loop, checking actual state (via API server) against desired state (from manifests or APIs) and taking corrective action.

Diagnosing Controller Issues

If controllers fail, pods may not schedule, services break, or nodes hang in NotReady. Use this workflow:

Check controller status:

kubectl get componentstatuses  # Legacy, may require kubelet enabled  
kubectl get --raw /readyz     # Kubernetes 1.27+

Look for controllers in “False” state.

Review logs:
```
kubectl logs -n kube-system kube-controller-manager-<node-name>  
```
Focus on errors like “failed to refresh cache” or “connection refused”.
Inspect events:
```
kubectl get events --sort-by=.metadata.creationTimestamp  
```
Filter for controller-related events (e.g., “Error calling api”).

Prevention: Policy Example

Enforce health checks and resource limits in production:

Policy:

Require all controllers to pass --leader-elect=true for high availability.
Set resource limits (e.g., --node-monitor-grace-period=15s).
Monitor with alerts for kube_controller_status metrics.

Example command-line flag configuration:

--concurrent-controllers=20 \  
--leader-elect=true \  
--node-monitor-period=10s \  
--node-monitor-grace-period=15s

Tooling

Use these tools to validate and troubleshoot:

Metrics:
- kube_controller_status_ready (is the controller healthy?)
- kube_controller_runtime_posts (API server request latency)
Debugging:
- kubectl describe pod kube-controller-manager (check events and config)
- kubectl get cs (simplified component status)
Recovery:
- Restart the controller manager process (via systemd or container restart).
- If etcd is unhealthy, fix connectivity or latency issues first.

Tradeoffs

Resource Usage vs Responsiveness:

Increasing --concurrent-controllers speeds up reconciliation but raises memory usage.
Shorter --node-monitor-period catches failures faster but increases API load.

Default settings are conservative; tune only after benchmarking.

Troubleshooting Common Failures

Symptom: Controller manager crashes repeatedly

Check:
- Resource exhaustion (memory/cpu limits in kube-controller-manager pod spec).
- Misconfigured flags (e.g., invalid --service-account-key-file).
Fix:
- Increase resource limits.
- Validate configuration with kube-apiserver --dry-run if custom flags are used.

Symptom: Controllers not reconciling changes

Check:
- Leader election issues (kubectl get events for “lost leader lock” errors).
- Network partitions between controller manager and API server.
Fix:
- Ensure --leader-elect=true is set.
- Test API server connectivity from controller manager nodes.

Symptom: High API server load

Check:
- Too many controllers running (--concurrent-controllers too high).
- Inefficient client libraries in custom controllers.
Fix:
- Reduce concurrency or rate-limit requests.
- Use --rate-limit-qps and --burst flags to throttle.

Final Note

The controller manager is resilient but not faultless. Monitor its health as rigorously as the workloads it manages. When in doubt, start with logs and metrics—assumptions about “working” controllers often hide subtle configuration drift.

Source thread: How does the Kubernetes controller manager work?

blog

Home

About

Blog

Projects

Posts

Categories

Contact

Recent Posts

Managing Database User Creation in GitOps Workflows

Kubernetes Revision and Reference Guide for Production Environments

Simplify Kubernetes Networking with a Purpose-built Appliance

Weak Coding Skills in Senior SRE Roles: Diagnosis and Mitigation

Configure Dex to Expose Additional Active Directory Fields