Zero Downtime Upgrades with Namespace Isolation and Canary Rollouts

Achieve zero downtime upgrades by combining namespace isolation, canary rollouts, and traffic shifting.

JR

2 minute read

Achieve zero downtime upgrades by combining namespace isolation, canary rollouts, and traffic shifting, with practical steps and tradeoffs.

Why This Matters

Downtime during updates risks user experience and revenue. Namespace-based rollouts and canary testing reduce this risk by decoupling deployment from traffic routing and allowing safe validation of new versions before full cutover.

Actionable Workflow

  1. Deploy to a New Namespace

    • Create a dedicated namespace for the new version (e.g., app-v2).
    • Deploy the updated application here alongside sidecars (e.g., service mesh proxies).
    • Validate readiness via health checks and synthetic requests.
  2. Shift Traffic Gradually

    • Use a service mesh (e.g., Istio) or ingress controller (e.g., OpenShift Route) to split traffic between namespaces.
    • Start with 5-10% of live traffic to the new version.
    • Monitor metrics (latency, error rates, resource usage).
  3. Automate Canary Analysis

    • Tools like Argo Rollouts or Flagger automate traffic shifting based on metrics or manual approval gates.
    • Example:
      apiVersion: argoproj.io/v1alpha1
      kind: Rollout
      metadata:
        name: my-app
      spec:
        strategy:
          canary:
            steps:
              - setWeight: 10
              - pause: { duration: 5m }
              - setWeight: 50
              - pause: { duration: 10m }
      
  4. Full Cutover and Cleanup

    • Once stable, shift 100% traffic to the new namespace.
    • Delete the old namespace after a grace period (ensure no lingering dependencies).

Policy Example: Ingress Configuration

For OpenShift, use a HTTPRoute (Gateway API) to split traffic:

apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: app-route
spec:
  parentRefs:
  - name: app-ingress
  rules:
  - matches:
    - path:
        prefix: /api
    backend:
      service:
        name: app-v1
        port: 8080
      weight: 90
    - backend:
        service:
          name: app-v2
          port: 8080
        weight: 10

Tooling

  • Argo Rollouts: Declarative canary and blue/green rollouts.
  • Flagger: Integrates with Prometheus for automated progressive delivery.
  • OpenShift Routes: Native traffic splitting via weight parameter.
  • Service Mesh (Istio): Fine-grained traffic management with mTLS.

Tradeoffs

  • Resource Overhead: Running two versions increases CPU/memory usage.
  • Config Complexity: Namespace-per-version requires careful service discovery and dependency management.
  • Testing Burden: Validation in staging must mirror production traffic patterns.

Troubleshooting

  • Stuck Deployments: Check kubectl get rollout <workload> and events for image pull errors or readiness probe failures.
  • Traffic Not Splitting: Verify service DNS names match across namespaces (e.g., app-v1.my-namespace.svc.cluster.local).
  • Health Check Failures: Ensure liveness/readiness probes align with application initialization time (e.g., database connections).
  • Session Affinity: If enabled, test that it doesn’t pin users to the old version during cutover.

Final Note

Namespace isolation works best for stateless services with independent data planes. For stateful workloads or tightly coupled microservices, consider database migration strategies and version compatibility testing. Always test rollback procedures before relying on them in production.

Source thread: Zero Downtime Upgrades?

comments powered by Disqus