EKS AL2 to AL2023 memory usage spikes in nginx, anyone else

Diagnosing and Resolving Memory Spikes in Nginx on EKS AL2023 If you’ve migrated from Amazon Linux 2 (AL2) to AL2023 on EKS and noticed memory spikes in Nginx.

February 16, 2026 JR

3 minute read

Diagnosing and Resolving Memory Spikes in Nginx on EKS AL2023

If you’ve migrated from Amazon Linux 2 (AL2) to AL2023 on EKS and noticed memory spikes in Nginx pods, you’re not alone. This post walks through a pragmatic diagnosis and repair workflow, with actionable steps to prevent recurrence.

Diagnosis: What’s Changed?

AL2023 introduces updates to containerd, kernel versions, and cgroup management. Memory spikes in Nginx often stem from:

cgroup v1 vs v2 Reporting Differences
AL2023 defaults to cgroup v2, which reports memory usage differently than v1. Nginx (or its underlying glibc) may misreport memory under cgroup v2, causing Kubernetes to OOMKill pods despite available memory.
Resource Limits Misconfiguration
If memory limits aren’t aligned with Nginx’s actual usage patterns (e.g., during SSL termination or high request volume), Kubernetes may evict pods prematurely.
Kernel or containerd Bugs
While AL2023 includes fixes (e.g., containerd 1.5+), older versions of Nginx or misconfigured node agents (e.g., kubelet) can exacerbate issues.

Verify the Issue

Run these commands to triage:

# Check node/pod memory usage  
kubectl top nodes  
kubectl top pods -l app=nginx  

# Inspect pod events for OOMKilled  
kubectl describe pod <nginx-pod-name>  

# Check containerd version (should be ≥1.5)  
cat /etc/os-release && containerd --version

If pods are OOMKilled despite low actual memory usage, suspect cgroup v2 reporting or misconfigured limits.

Repair Steps

1. Adjust Memory Requests/Limits

Temporarily increase memory limits to mitigate OOMKills while diagnosing:

resources:  
  requests:  
    memory: "512Mi"  
  limits:  
    memory: "1Gi"

Monitor usage with kubectl top pods and adjust based on observed peaks.

2. Force cgroup v1 (If Necessary)

If cgroup v2 is suspected, force v1 on nodes:

Update kernel boot parameters:

echo 'cgroup_enable=memory' >> /boot/grub/grub.cfg

Reboot nodes and verify:

mount | grep cgroup  
# Should show "cgroup on /sys/fs/cgroup/memory type cgroup"

Tradeoff: cgroup v1 is deprecated; use this only as a temporary workaround.

3. Update Nginx and Dependencies

Ensure Nginx is updated to a version ≥1.21.0 (better cgroup v2 compatibility). Example:

# In Dockerfile  
FROM nginx:1.21

4. Validate Kernel and containerd

Ensure nodes use kernel ≥5.10 and containerd ≥1.5. Update EKS node groups to the latest AL2023 AMI.

Prevention

Policy Example: Resource Quotas

Enforce memory limits at the namespace level to prevent runaway usage:

apiVersion: v1  
kind: ResourceQuota  
metadata:  
  name: nginx-memory-quota  
spec:  
  hard:  
    memory: "10Gi"  
    pods: "10"

Apply to production namespaces to cap total memory consumption.

Monitoring Workflow

Alert on Memory Usage: Use Prometheus + Alertmanager to trigger alerts when Nginx memory exceeds 80% of limits.

Log Node OOM Events:

journalctl -u kubelet | grep -i "out of memory"

Regularly Review Metrics:
```
kubectl get hpa,quota,limits -A  
```

Tooling

kubectl: For real-time pod/node metrics (kubectl top).
Prometheus/Grafana: Long-term monitoring of memory trends.
Node Problem Detector: Logs node-level issues (e.g., OOM events).
AWS CloudWatch: Track node memory usage outside Kubernetes.

Conclusion

Memory spikes in Nginx on AL2023 are often due to cgroup v2 reporting quirks or misconfigured limits—not true leaks. Adjust resource limits, validate dependencies, and enforce quotas to stabilize workloads. Prioritize monitoring and updates to prevent recurrence. If spikes persist, test cgroup v1 as a fallback while working with AWS/Kubernetes upstream teams for long-term fixes.

Source thread: EKS AL2 to AL2023 memory usage spikes in nginx, anyone else?

blog

Home

About

Blog

Projects

Posts

Categories

Contact

Recent Posts

Cilium Network Policies: Granularity in Production

External Secrets Operator: Reconciliation and Auth in Production

Egress Control on Eks: Cilium Vs Istio Ambient Mesh in 2026

Diagnosing and Resolving GPU Node Failures in Kubernetes Clusters

Kubent's Current State and Alternatives for Policy Enforcement