Practical Rightsizing and Autoscaling for Kubernetes Workloads
Rightsizing and autoscaling Kubernetes workloads requires iterative measurement, policy enforcement.
Rightsizing and autoscaling Kubernetes workloads requires iterative measurement, policy enforcement, and monitoring to balance performance and cost.
Workflow for Production Rightsizing
-
Measure baseline usage:
- Use
kubectl top pods --namespace <ns>to observe current CPU/memory usage. - Check historical metrics via Prometheus or
kubectl describe pod <pod>for OOM kills or throttling. - Example:
kubectl get hpa -Ato review existing autoscalers.
- Use
-
Set conservative initial limits:
- Start with
requestsat 25-50% above observed usage to avoid immediate OOM. - Use
limitsto cap resource-hungry workloads (e.g.,resources.limits.memory: "512Mi"). - Validate with
kubectl describe pod <pod>to ensure scheduler accepts requests.
- Start with
-
Monitor and adjust:
- Track throttling events:
kubectl describe pod <pod> | grep -i "throttled" - Use Prometheus alerts for sustained usage >80% of limits.
- Gradually tighten limits/request as confidence grows.
- Track throttling events:
-
Enforce policies:
- Apply resource quotas per namespace:
kubectl apply -f quota.yaml. - Use admission controllers (e.g., OPA Gatekeeper) to reject pods without limits.
- Apply resource quotas per namespace:
-
Automate with HPA/VPA:
- Deploy Horizontal Pod Autoscaler (HPA) for stable workloads:
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - For dynamic workloads, use Vertical Pod Autoscaler (VPA) in recommend mode first.
- Deploy Horizontal Pod Autoscaler (HPA) for stable workloads:
Policy Example: Resource Quota
apiVersion: v1
kind: ResourceQuota
metadata:
name: prod-quota
spec:
hard:
requests.memory: "16Gi"
limits.memory: "32Gi"
requests.cpu: "8"
limits.cpu: "16"
Apply with kubectl apply -f quota.yaml.
Tooling
- Goldilocks: Analyzes historical usage to recommend limits. Run as a sidecar:
kubectl apply -f https://raw.githubusercontent.com/FairwindsOps/goldilocks/main/deploy/goldilocks.yaml - VPA: Kubernetes-native recommender/autoscaler. Start in recommend mode:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster/vpa/deploy/1.21/ - Prometheus + Grafana: Visualize usage trends and set alerts.
- HPA: Built-in horizontal scaling for stable metrics.
Tradeoffs and Caveats
- VPA in auto mode: Can cause excessive rescheduling if limits are too tight. Start with recommend mode.
- HPA limitations: Doesn’t handle vertical scaling; combine with VPA for full coverage.
- Overhead: Monitoring and policy enforcement add complexity; balance with workload criticality.
Troubleshooting Common Issues
- OOM kills despite limits:
- Check for memory leaks via
kubectl logs --previous <pod>. - Ensure
requests.memoryis set to avoid scheduler evictions.
- Check for memory leaks via
- HPA not scaling:
- Verify metrics API is enabled (
kubectl api-resources | grep metrics). - Check HPA events:
kubectl describe hpa <hpa-name>.
- Verify metrics API is enabled (
- VPA recommendations too low:
- Review historical usage in VPA custom resource:
kubectl get vpa <vpa-name> -o yaml. - Adjust tolerance for fluctuation in VPA configuration.
- Review historical usage in VPA custom resource:
Start small, validate often, and automate incrementally. Rightsizing is a process, not a one-time fix.
Source thread: What do you guys recommend for rightsizing and autoscaling workloads in k8s?

Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email