Setting Resource Requests and Limits in Production Kubernetes
Set resource requests based on observed usage, apply limits cautiously, and validate with real metrics.
Set resource requests based on observed usage, apply limits cautiously, and validate with real metrics.
Practical Workflow
- Baseline metrics: Use
kubectl topor Prometheus to observe historical CPU/memory usage. - Set requests: Allocate 20-30% above observed averages to buffer for spikes.
- Apply limits: Use limits only if you need strict resource isolation (e.g., batch jobs). For long-running apps, omit limits unless throttling is acceptable.
- Monitor post-deploy: Check for OOM kills, throttling, or performance degradation.
- Iterate: Adjust based on metrics, not guesses.
Concrete Policy Example
CPU:
- Request:
150m(guarantees 15% of a core) - Limit: Omit unless strict capping is required (e.g., sidecars).
Memory:
- Request: Match observed usage (e.g.,
512Miif app typically uses 400Mi) - Limit: Set equal to request unless app explicitly releases memory (e.g., JVM GC).
resources:
requests:
cpu: "150m"
memory: "512Mi"
limits:
memory: "512Mi"
Tooling
- kubectl top: Real-time resource usage per pod.
- Prometheus + Grafana: Historical trends and alerts for anomalies.
- Vertical Pod Autoscaler (VPA): Test request/limit impact in non-prod clusters.
- cgroup checker: Verify cgroups v2 support with
cat /proc/self/cgroup.
Tradeoffs and Caveats
- No limits: Risk resource starvation (noisy neighbors) but allow burst capacity.
- Strict limits: Prevent overcommit but may throttle legitimate traffic (e.g., JVM GC pauses under memory limits).
- JVM quirks: Pre-Java 15 or misconfigured cgroups v2 can misreport CPU/memory. Always validate with
kubectl describe podand logs.
Troubleshooting Common Issues
- OOM kills: Check
kubectl describe podfor memory limits. Usedmesg | grep -i killto confirm OOM events. - CPU throttling: Run
kubectl top podand compare requested CPU to usage. High throttling indicates undersized requests. - cgroups v2 issues: If JVM reports incorrect CPU/memory, verify kernel version (≥5.8 recommended) and Java version (≥17 with
-XX:+UseCGroupMemoryLimitForHeapflag). - Unexpected behavior: Test with
stress-ngto simulate load and observe resource enforcement.
In production, I’ve seen teams burned by assuming JVM auto-tuning works flawlessly. Always pair resource settings with monitoring and test upgrades in staging first.
Source thread: How would you setup the resource requests and limits on this workload? (this is mostly about how different people approach it)

Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email