Blue/green Cluster Upgrades in Eks with External-dns
Streamline EKS blue/green upgrades by orchestrating node groups, external-dns synchronization.
Streamline EKS blue/green upgrades by orchestrating node groups, external-dns synchronization, and DNS propagation checks to minimize downtime.
Workflow
-
Prepare Green Cluster
- Create a new EKS cluster (green) with updated Kubernetes version using
eksctlor AWS Console. - Mirror node groups from blue cluster, ensuring identical IAM roles and security group configurations.
- Create a new EKS cluster (green) with updated Kubernetes version using
-
Sync DNS Records
- Deploy external-dns to green cluster with
--dns-provider=route53and--sync-onlyflag on blue cluster to avoid conflicts. - Validate DNS records are populated in green cluster using
kubectl get services -o wide.
- Deploy external-dns to green cluster with
-
Validate Services
- Deploy test workloads to green cluster and confirm endpoints are reachable.
- Use
digor AWS Route 53 dashboard to verify DNS records point to green cluster IPs.
-
Cutover DNS
- Update DNS TTL to 60 seconds beforehand to reduce propagation delays.
- Flip DNS A/CNAME records to point to green cluster ingress controllers.
-
Monitor and Rollback
- Watch CloudWatch metrics (e.g., HTTP 5xx errors) and node health for 30 minutes post-cutover.
- If issues arise, revert DNS and drain green cluster nodes using
kubectl drain --ignore-daemonsets --delete-emptydir-data.
Policy Example
Node Group Upgrade Policy
- Enforce labels
kubernetes.io/role: nodeandeks.amazonaws.com/capacity-type: <value>on new node groups. - Require taints/toleration matches between blue and green clusters to prevent scheduling mismatches.
Tooling
- eksctl: Manage cluster and node group lifecycle (
eksctl create nodegroup --nodes 3). - external-dns: Sync services to Route 53 (
external-dns --provider kube --domain <domain> --txt-ttl 60). - AWS Route 53: Monitor DNS propagation via dashboard or
dig @route53-server <domain>. - Prometheus/Grafana: Alert on service latency or error rate spikes during cutover.
Tradeoffs
- Resource Overhead: Green cluster requires ~2x node resources temporarily.
- DNS Propagation: Even with low TTL, global users may experience minutes of latency.
- Sync Conflicts: Misconfigured external-dns RBAC or duplicate DNS entries can cause outages.
Troubleshooting
-
DNS Not Updating:
- Check external-dns logs for AWS API errors (
kubectl logs -n kube-system deployment/external-dns). - Verify IAM policy permissions for Route 53 (
external-dnsrequiresroute53:ChangeResourceRecordSets).
- Check external-dns logs for AWS API errors (
-
Node Registration Failures:
- Inspect cloud-controller-manager logs (
kubectl logs -n kube-system <cloud-controller-pod>). - Confirm instance role on green cluster nodes has
eks:DescribeNodegrouppermissions.
- Inspect cloud-controller-manager logs (
-
Service Endpoints Stale:
- Force endpoint controller sync:
kubectl annotate service <service-name> endpoints.openapiserver.kubernetes.io/reconcile.
- Force endpoint controller sync:
Avoid overcomplicating with canary deployments unless you need granular traffic shifting—blue/green is simpler for most use cases. Always test upgrades in staging with production-like workloads first.
Source thread: Any tips on blue/green cluster upgrades in EKS while using external-dns?

Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email