Choosing Master and Worker Nodes for Production Kubernetes
Use dedicated, high-availability master nodes with isolated resources and standardized worker nodes sized for workload demands.
Use dedicated, high-availability master nodes with isolated resources and standardized worker nodes sized for workload demands.
Master Nodes: Stability Over Convenience
Master nodes (API server, etcd, controller manager) demand dedicated resources. Avoid collocating with workloads—this isn’t theoretical; I’ve seen clusters destabilized by noisy neighbors during peak loads.
Action Steps:
- Deploy at least three master nodes for HA (odd number for etcd quorum).
- Use static IPs and FQDNs for all master components.
- Isolate master traffic with network policies (e.g., block non-API ports).
Policy Example:
apiVersion: v1
kind: ConfigMap
metadata:
name: master-node-policy
data:
taints: "node-role.kubernetes.io/master:NoSchedule"
Apply this taint to prevent accidental workloads on masters.
Tooling:
- kubeadm: For bootstrapping self-managed clusters with clear separation.
- Cloud Provider APIs: AWS EKS, GCP GKE, or Azure AKS handle masters as a managed service (tradeoff: less control).
- Prometheus + Grafana: Monitor master component health (etcd latency, API server errors).
Tradeoff: Dedicated masters increase cost but reduce blast radius during failures. Managed services reduce operational burden but may limit customization.
Troubleshooting:
- etcd Issues: Check logs for leader elections or network partitions. Use
etcdctlto verify cluster health. - API Server Downtime: Rotate certificates proactively; expired certs have tanked clusters during peak hours.
Worker Nodes: Fit for Purpose
Worker nodes run your apps—size them for the actual workloads, not theoretical maxima.
Action Steps:
- Profile application resource usage (CPU, memory, storage I/O).
- Use node pools for different workloads (e.g., GPU nodes for ML, standard nodes for web apps).
- Enable auto-scaling but set realistic bounds (too aggressive = thrash, too conservative = waste).
Policy Example:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
Tooling:
- Karpenter or Cluster Autoscaler: Dynamically adjust node count based on pod requirements.
- Node Conformance Testing: Use kube-burner to simulate workloads and validate node performance.
Tradeoff: Over-provisioning workers wastes money; under-provisioning causes evictions and OOM kills. Balance with real metrics.
Troubleshooting:
- Node Not Ready: Check cloud provider console for stopped/terminated instances.
- Pod Schedule Failures: Run
kubectl describe nodesto inspect resource pressure or taints.
Final Checklist
- Masters: HA, isolated, monitored.
- Workers: Right-sized, auto-scaling, node pools for affinity/anti-affinity needs.
- Drain nodes during upgrades with
kubectl drain --ignore-daemonsets --delete-emptydir-data.
No single stack fits all, but these patterns have kept clusters stable under 10k+ node fleets. Adjust based on your team’s capacity and workload reality.
Source thread: What do you use for Master and Workers?

Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email