Blue/green Cluster Upgrades in Eks with External-dns

Streamline EKS blue/green upgrades by orchestrating node groups, external-dns synchronization.

June 14, 2026 JR

2 minute read

Streamline EKS blue/green upgrades by orchestrating node groups, external-dns synchronization, and DNS propagation checks to minimize downtime.

Workflow

Prepare Green Cluster
- Create a new EKS cluster (green) with updated Kubernetes version using eksctl or AWS Console.
- Mirror node groups from blue cluster, ensuring identical IAM roles and security group configurations.
Sync DNS Records
- Deploy external-dns to green cluster with --dns-provider=route53 and --sync-only flag on blue cluster to avoid conflicts.
- Validate DNS records are populated in green cluster using kubectl get services -o wide.
Validate Services
- Deploy test workloads to green cluster and confirm endpoints are reachable.
- Use dig or AWS Route 53 dashboard to verify DNS records point to green cluster IPs.
Cutover DNS
- Update DNS TTL to 60 seconds beforehand to reduce propagation delays.
- Flip DNS A/CNAME records to point to green cluster ingress controllers.
Monitor and Rollback
- Watch CloudWatch metrics (e.g., HTTP 5xx errors) and node health for 30 minutes post-cutover.
- If issues arise, revert DNS and drain green cluster nodes using kubectl drain --ignore-daemonsets --delete-emptydir-data.

Policy Example

Node Group Upgrade Policy

Enforce labels kubernetes.io/role: node and eks.amazonaws.com/capacity-type: <value> on new node groups.
Require taints/toleration matches between blue and green clusters to prevent scheduling mismatches.

Tooling

eksctl: Manage cluster and node group lifecycle (eksctl create nodegroup --nodes 3).
external-dns: Sync services to Route 53 (external-dns --provider kube --domain <domain> --txt-ttl 60).
AWS Route 53: Monitor DNS propagation via dashboard or dig @route53-server <domain>.
Prometheus/Grafana: Alert on service latency or error rate spikes during cutover.

Tradeoffs

Resource Overhead: Green cluster requires ~2x node resources temporarily.
DNS Propagation: Even with low TTL, global users may experience minutes of latency.
Sync Conflicts: Misconfigured external-dns RBAC or duplicate DNS entries can cause outages.

Troubleshooting

DNS Not Updating:
- Check external-dns logs for AWS API errors (kubectl logs -n kube-system deployment/external-dns).
- Verify IAM policy permissions for Route 53 (external-dns requires route53:ChangeResourceRecordSets).
Node Registration Failures:
- Inspect cloud-controller-manager logs (kubectl logs -n kube-system <cloud-controller-pod>).
- Confirm instance role on green cluster nodes has eks:DescribeNodegroup permissions.
Service Endpoints Stale:
- Force endpoint controller sync: kubectl annotate service <service-name> endpoints.openapiserver.kubernetes.io/reconcile.

Avoid overcomplicating with canary deployments unless you need granular traffic shifting—blue/green is simpler for most use cases. Always test upgrades in staging with production-like workloads first.

Source thread: Any tips on blue/green cluster upgrades in EKS while using external-dns?

blog

Home

About

Blog

Projects

Posts

Categories

Contact

Recent Posts

Securing Kubernetes Pods: Field-tested Practices for Production

Cspm Vs Cnapp: Clarifying the Divide for Platform Engineers

Diagnosing and Fixing Common Kubernetes Node Issues in Production

Structured Troubleshooting for Production Kubernetes

Managing Kustomize Overlay Complexity in Production