Upgrading Amazon Eks 1.36: Production-tested Workflow

Upgrading to Amazon EKS 1.36 requires rigorous validation of add-ons, node group compatibility.

JR

2 minute read

Upgrading to Amazon EKS 1.36 requires rigorous validation of add-ons, node group compatibility, and rollback readiness to avoid outages.

Pre-Upgrade Validation Checklist

  1. Review EKS release notes: Confirm deprecations (e.g., removed APIs, changed defaults) and required actions.
  2. Audit add-ons:
    • Check compatibility of VPC CNI, CoreDNS, and Karpenter with EKS 1.36.
    • Validate third-party tools (e.g., Fluentd, Istio) against the new Kubernetes version.
  3. Node group readiness:
    • Ensure node IAM roles include latest EKS permissions.
    • Test node image updates (e.g., Amazon EKS-optimized AMI 2.1.0+) in staging.
  4. Backup critical data:
    • Dump etcd (if self-managed) or use Velero for cluster resource backups.

Upgrade Workflow

  1. Staging cluster first:
    eksctl upgrade cluster -n <staging-cluster> --kubernetes-version 1.36  
    

    Validate workloads, ingress, and storage classes post-upgrade.

  2. Control plane upgrade:
    aws eks update-kubeconfig --region <region> --cluster-name <prod-cluster>  
    aws eks update-cluster --region <region> --cluster-name <prod-cluster> --role-arn <eks-service-role-arn> --kubernetes-version 1.36  
    

    Monitor CloudTrail and CloudWatch for API server errors.

  3. Node group upgrades:
    • Drain nodes:
      kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data  
      
    • Update node groups:
      eksctl scale nodegroup -n <nodegroup-name> --cluster <cluster-name> --nodes 0  
      eksctl create nodegroup -n <nodegroup-name> --cluster <cluster-name> --kubernetes-version 1.36  
      
  4. Post-upgrade validation:
    • Check node status: kubectl get nodes -o wide
    • Test pod scheduling and network policies.
    • Verify add-on functionality (e.g., CoreDNS resolution, CNI plugin logs).

Policy Example: Version Enforcement

Enforce a 30-day hold on EKS version upgrades post-release using AWS Config:

{  
  "Version": "2012-10-17",  
  "Statement": [  
    {  
      "Effect": "Deny",  
      "Action": "eks:upgradeCluster",  
      "Resource": "*",  
      "Condition": {  
        "StringNotLike": {  
          "eks:kubernetesVersion": ["1.35", "1.34"]  
        }  
      }  
    }  
  ]  
}  

Tradeoff: Delays access to new features but reduces risk of untested breaking changes.

Tooling

  • EKS-ops: For automated cluster lifecycle management and idempotent upgrades.
  • kubectl check-aws: Diagnose node registration issues post-upgrade.
  • AWS CloudFormation: Template-based node group and add-on updates.

Troubleshooting Common Failures

  • Node group stuck in “creating”:
    • Check IAM role permissions for eks.amazonaws.com service.
    • Verify subnet/SG capacity limits.
  • Pods not scheduling:
    • Inspect node taints and tolerations: kubectl describe node <node-name>
    • Check Karpenter or cluster autoscaler logs.
  • API server latency:
    • Monitor CloudWatch metrics for APIServer_Latency.
    • Roll back control plane if issues persist:
      aws eks update-cluster --kubernetes-version 1.35  
      

Caveats

  • Deprecation risks: EKS 1.36 may drop support for older API versions (e.g., extensions/v1beta1).
  • Add-on drift: Some EKS-managed add-ons (e.g., VPC CNI) require manual manifest updates post-upgrade.
  • Rollback complexity: Node groups using custom AMIs may not support downgrade paths.

Always test in non-production environments first. When in doubt, engage AWS Support with cluster IDs and error logs.

Source thread: Anyone already testing Amazon EKS 1.36? Here’s my upgrade experience so far.

comments powered by Disqus