K3s in Production: Practical Considerations and Outcomes
k3s is viable for lightweight production workloads with proper planning.
k3s is viable for lightweight production workloads with proper planning, though tradeoffs exist in scalability and ecosystem support.
Actionable Workflow for k3s Adoption
-
Assess Workload Requirements:
- k3s excels for small teams, edge deployments, or stateless apps with predictable resource needs.
- Avoid if you need advanced networking (e.g., Cilium), complex storage classes, or large node pools (>20 nodes).
-
Test in Staging:
- Deploy a non-critical service (e.g., monitoring stack, CI/CD runners) to validate performance and upgrade paths.
- Use
k3s server --datastore-endpointto test etcd integration if needed.
-
Deploy with HA in Mind:
- For production, run at least 3 server nodes with external etcd (Postgres is not sufficient for HA).
- Use
k3s agentfor worker nodes with taints to isolate system pods:k3s agent --server <server-ip>:6443 --token <token> --taint node-role.kubernetes.io/control-plane:Effect=NoSchedule
-
Monitor and Maintain:
- Enable metrics-server and Prometheus for visibility.
- Schedule regular snapshot backups with
k3s etcd snapshot.
-
Plan Upgrades:
- Test version upgrades in staging first. Use
k3s upgradewith--drainto cordon nodes safely.
- Test version upgrades in staging first. Use
Policy Example: Resource Limits
Enforce resource constraints to prevent noisy neighbors:
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
namespace-resource-limit: "true"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
Tooling
- k3s: Lightweight binary with built-in Docker or containerd.
- Helm: For deploying apps (ensure tiller is not used in newer versions).
- Prometheus/Grafana: Metrics collection via
k3s helm install prometheus bitnami/prometheus. - Terraform: For provisioning nodes (e.g., AWS EC2 or on-prem VMs).
Tradeoffs and Caveats
- Scalability: k3s struggles beyond 20 nodes; consider Rancher or upstream K8s for larger clusters.
- Ecosystem Gaps: Some CRDs or operators (e.g., Istio, ArgoCD) may lack testing on k3s.
- HA Complexity: External etcd adds operational overhead compared to embedded SQLite (not recommended for prod).
Troubleshooting Common Issues
-
Etcd Performance:
- Symptoms: High latency, API server timeouts.
- Fix: Use SSD-backed storage, ensure etcd nodes are isolated from workload traffic.
-
Node Registration Failures:
- Check
journalctl -u k3s-agentfor token mismatches or network issues. - Validate firewall rules allow 6443/tcp between servers and agents.
- Check
-
Pod Evictions:
- Cause: Resource starvation (common in default settings).
- Fix: Set
--kubelet-arg="eviction-hard=memory.available<5%,nodefs.available<10%"on servers.
Conclusion
k3s works for teams needing a lean, fast setup but requires deliberate planning around its limitations. Prioritize monitoring, backups, and upgrade testing to avoid outages. If your needs grow beyond 20 nodes or require advanced features, migrate to upstream Kubernetes early.
Source thread: Is anyone else using k3s in production and happy about it?

Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email