Upgrading Amazon Eks 1.36: Production-tested Workflow
Upgrading to Amazon EKS 1.36 requires rigorous validation of add-ons, node group compatibility.
Upgrading to Amazon EKS 1.36 requires rigorous validation of add-ons, node group compatibility, and rollback readiness to avoid outages.
Pre-Upgrade Validation Checklist
- Review EKS release notes: Confirm deprecations (e.g., removed APIs, changed defaults) and required actions.
- Audit add-ons:
- Check compatibility of VPC CNI, CoreDNS, and Karpenter with EKS 1.36.
- Validate third-party tools (e.g., Fluentd, Istio) against the new Kubernetes version.
- Node group readiness:
- Ensure node IAM roles include latest EKS permissions.
- Test node image updates (e.g., Amazon EKS-optimized AMI 2.1.0+) in staging.
- Backup critical data:
- Dump etcd (if self-managed) or use Velero for cluster resource backups.
Upgrade Workflow
- Staging cluster first:
eksctl upgrade cluster -n <staging-cluster> --kubernetes-version 1.36Validate workloads, ingress, and storage classes post-upgrade.
- Control plane upgrade:
aws eks update-kubeconfig --region <region> --cluster-name <prod-cluster> aws eks update-cluster --region <region> --cluster-name <prod-cluster> --role-arn <eks-service-role-arn> --kubernetes-version 1.36Monitor CloudTrail and CloudWatch for API server errors.
- Node group upgrades:
- Drain nodes:
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data - Update node groups:
eksctl scale nodegroup -n <nodegroup-name> --cluster <cluster-name> --nodes 0 eksctl create nodegroup -n <nodegroup-name> --cluster <cluster-name> --kubernetes-version 1.36
- Drain nodes:
- Post-upgrade validation:
- Check node status:
kubectl get nodes -o wide - Test pod scheduling and network policies.
- Verify add-on functionality (e.g., CoreDNS resolution, CNI plugin logs).
- Check node status:
Policy Example: Version Enforcement
Enforce a 30-day hold on EKS version upgrades post-release using AWS Config:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": "eks:upgradeCluster",
"Resource": "*",
"Condition": {
"StringNotLike": {
"eks:kubernetesVersion": ["1.35", "1.34"]
}
}
}
]
}
Tradeoff: Delays access to new features but reduces risk of untested breaking changes.
Tooling
- EKS-ops: For automated cluster lifecycle management and idempotent upgrades.
- kubectl check-aws: Diagnose node registration issues post-upgrade.
- AWS CloudFormation: Template-based node group and add-on updates.
Troubleshooting Common Failures
- Node group stuck in “creating”:
- Check IAM role permissions for
eks.amazonaws.comservice. - Verify subnet/SG capacity limits.
- Check IAM role permissions for
- Pods not scheduling:
- Inspect node taints and tolerations:
kubectl describe node <node-name> - Check Karpenter or cluster autoscaler logs.
- Inspect node taints and tolerations:
- API server latency:
- Monitor CloudWatch metrics for
APIServer_Latency. - Roll back control plane if issues persist:
aws eks update-cluster --kubernetes-version 1.35
- Monitor CloudWatch metrics for
Caveats
- Deprecation risks: EKS 1.36 may drop support for older API versions (e.g.,
extensions/v1beta1). - Add-on drift: Some EKS-managed add-ons (e.g., VPC CNI) require manual manifest updates post-upgrade.
- Rollback complexity: Node groups using custom AMIs may not support downgrade paths.
Always test in non-production environments first. When in doubt, engage AWS Support with cluster IDs and error logs.
Source thread: Anyone already testing Amazon EKS 1.36? Here’s my upgrade experience so far.

Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email