Practical Rightsizing and Autoscaling for Kubernetes Workloads

Rightsizing and autoscaling Kubernetes workloads requires iterative measurement, policy enforcement.

JR

2 minute read

Rightsizing and autoscaling Kubernetes workloads requires iterative measurement, policy enforcement, and monitoring to balance performance and cost.

Workflow for Production Rightsizing

  1. Measure baseline usage:

    • Use kubectl top pods --namespace <ns> to observe current CPU/memory usage.
    • Check historical metrics via Prometheus or kubectl describe pod <pod> for OOM kills or throttling.
    • Example: kubectl get hpa -A to review existing autoscalers.
  2. Set conservative initial limits:

    • Start with requests at 25-50% above observed usage to avoid immediate OOM.
    • Use limits to cap resource-hungry workloads (e.g., resources.limits.memory: "512Mi").
    • Validate with kubectl describe pod <pod> to ensure scheduler accepts requests.
  3. Monitor and adjust:

    • Track throttling events: kubectl describe pod <pod> | grep -i "throttled"
    • Use Prometheus alerts for sustained usage >80% of limits.
    • Gradually tighten limits/request as confidence grows.
  4. Enforce policies:

    • Apply resource quotas per namespace: kubectl apply -f quota.yaml.
    • Use admission controllers (e.g., OPA Gatekeeper) to reject pods without limits.
  5. Automate with HPA/VPA:

    • Deploy Horizontal Pod Autoscaler (HPA) for stable workloads:
      apiVersion: autoscaling/v2  
      kind: HorizontalPodAutoscaler  
      metadata:  
        name: my-app-hpa  
      spec:  
        scaleTargetRef:  
          apiVersion: apps/v1  
          kind: Deployment  
          name: my-app  
        minReplicas: 2  
        maxReplicas: 10  
        metrics:  
        - type: Resource  
          resource:  
            name: cpu  
            target:  
              type: Utilization  
              averageUtilization: 70  
      
    • For dynamic workloads, use Vertical Pod Autoscaler (VPA) in recommend mode first.

Policy Example: Resource Quota

apiVersion: v1  
kind: ResourceQuota  
metadata:  
  name: prod-quota  
spec:  
  hard:  
    requests.memory: "16Gi"  
    limits.memory: "32Gi"  
    requests.cpu: "8"  
    limits.cpu: "16"  

Apply with kubectl apply -f quota.yaml.

Tooling

  • Goldilocks: Analyzes historical usage to recommend limits. Run as a sidecar:
    kubectl apply -f https://raw.githubusercontent.com/FairwindsOps/goldilocks/main/deploy/goldilocks.yaml  
    
  • VPA: Kubernetes-native recommender/autoscaler. Start in recommend mode:
    kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster/vpa/deploy/1.21/  
    
  • Prometheus + Grafana: Visualize usage trends and set alerts.
  • HPA: Built-in horizontal scaling for stable metrics.

Tradeoffs and Caveats

  • VPA in auto mode: Can cause excessive rescheduling if limits are too tight. Start with recommend mode.
  • HPA limitations: Doesn’t handle vertical scaling; combine with VPA for full coverage.
  • Overhead: Monitoring and policy enforcement add complexity; balance with workload criticality.

Troubleshooting Common Issues

  • OOM kills despite limits:
    • Check for memory leaks via kubectl logs --previous <pod>.
    • Ensure requests.memory is set to avoid scheduler evictions.
  • HPA not scaling:
    • Verify metrics API is enabled (kubectl api-resources | grep metrics).
    • Check HPA events: kubectl describe hpa <hpa-name>.
  • VPA recommendations too low:
    • Review historical usage in VPA custom resource: kubectl get vpa <vpa-name> -o yaml.
    • Adjust tolerance for fluctuation in VPA configuration.

Start small, validate often, and automate incrementally. Rightsizing is a process, not a one-time fix.

Source thread: What do you guys recommend for rightsizing and autoscaling workloads in k8s?

comments powered by Disqus