Practical Rightsizing and Autoscaling for Kubernetes Workloads

Rightsizing and autoscaling Kubernetes workloads requires iterative measurement, policy enforcement.

June 19, 2026 JR

2 minute read

Rightsizing and autoscaling Kubernetes workloads requires iterative measurement, policy enforcement, and monitoring to balance performance and cost.

Workflow for Production Rightsizing

Measure baseline usage:
- Use kubectl top pods --namespace <ns> to observe current CPU/memory usage.
- Check historical metrics via Prometheus or kubectl describe pod <pod> for OOM kills or throttling.
- Example: kubectl get hpa -A to review existing autoscalers.
Set conservative initial limits:
- Start with requests at 25-50% above observed usage to avoid immediate OOM.
- Use limits to cap resource-hungry workloads (e.g., resources.limits.memory: "512Mi").
- Validate with kubectl describe pod <pod> to ensure scheduler accepts requests.
Monitor and adjust:
- Track throttling events: kubectl describe pod <pod> | grep -i "throttled"
- Use Prometheus alerts for sustained usage >80% of limits.
- Gradually tighten limits/request as confidence grows.
Enforce policies:
- Apply resource quotas per namespace: kubectl apply -f quota.yaml.
- Use admission controllers (e.g., OPA Gatekeeper) to reject pods without limits.

Automate with HPA/VPA:

Deploy Horizontal Pod Autoscaler (HPA) for stable workloads:

apiVersion: autoscaling/v2  
kind: HorizontalPodAutoscaler  
metadata:  
  name: my-app-hpa  
spec:  
  scaleTargetRef:  
    apiVersion: apps/v1  
    kind: Deployment  
    name: my-app  
  minReplicas: 2  
  maxReplicas: 10  
  metrics:  
  - type: Resource  
    resource:  
      name: cpu  
      target:  
        type: Utilization  
        averageUtilization: 70

For dynamic workloads, use Vertical Pod Autoscaler (VPA) in recommend mode first.

Policy Example: Resource Quota

apiVersion: v1  
kind: ResourceQuota  
metadata:  
  name: prod-quota  
spec:  
  hard:  
    requests.memory: "16Gi"  
    limits.memory: "32Gi"  
    requests.cpu: "8"  
    limits.cpu: "16"

Apply with kubectl apply -f quota.yaml.

Tooling

Goldilocks: Analyzes historical usage to recommend limits. Run as a sidecar:

kubectl apply -f https://raw.githubusercontent.com/FairwindsOps/goldilocks/main/deploy/goldilocks.yaml

VPA: Kubernetes-native recommender/autoscaler. Start in recommend mode:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster/vpa/deploy/1.21/

Prometheus + Grafana: Visualize usage trends and set alerts.
HPA: Built-in horizontal scaling for stable metrics.

Tradeoffs and Caveats

VPA in auto mode: Can cause excessive rescheduling if limits are too tight. Start with recommend mode.
HPA limitations: Doesn’t handle vertical scaling; combine with VPA for full coverage.
Overhead: Monitoring and policy enforcement add complexity; balance with workload criticality.

Troubleshooting Common Issues

OOM kills despite limits:
- Check for memory leaks via kubectl logs --previous <pod>.
- Ensure requests.memory is set to avoid scheduler evictions.
HPA not scaling:
- Verify metrics API is enabled (kubectl api-resources | grep metrics).
- Check HPA events: kubectl describe hpa <hpa-name>.
VPA recommendations too low:
- Review historical usage in VPA custom resource: kubectl get vpa <vpa-name> -o yaml.
- Adjust tolerance for fluctuation in VPA configuration.

Start small, validate often, and automate incrementally. Rightsizing is a process, not a one-time fix.

Source thread: What do you guys recommend for rightsizing and autoscaling workloads in k8s?

blog

Home

About

Blog

Projects

Posts

Categories

Contact

Recent Posts

Managing Ai Agents as Kubernetes Platform Users

Istio Sidecar Proxy Capture Scope and Limitations

Validating and Refining Your Kubernetes Study Plan

Production-ready Kubernetes: What Works in Practice

Database Migrations in Kubernetes: Practical Workflow and Policy