Cilium Network Policies: Granularity in Production

Start with default-deny ingress and refine policies only where necessary to enforce least privilege without overcomplicating.

JR

2 minute read

Start with default-deny ingress and refine policies only where necessary to enforce least privilege without overcomplicating maintenance.

Default-deny ingress is non-negotiable for production security, but overly granular policies risk operational bloat and human error. Granularity should align with actual risk surfaces: tighten rules for high-value workloads (e.g., databases, auth services) and loosen for ephemeral or low-risk components.

Actionable Workflow

  1. Baseline with cluster-wide default-deny
    apiVersion: cilium.io/v2
    kind: CiliumClusterwideNetworkPolicy
    metadata:
      name: default-deny-ingress
    spec:
      description: "Deny all ingress by default"
      rules:
        - ingress:
            - {}
    
  2. Identify allowed sources
    • Monitoring systems (e.g., Prometheus, Grafana)
    • Internal service dependencies (e.g., API gateways, message brokers)
    • Explicitly permitted namespaces via labels (e.g., app: frontend, team: analytics)
  3. Define policies per namespace
    Use labels to group workloads and reference them in policies. Avoid per-pod rules unless absolutely required.

Concrete Policy Example

Allow monitoring namespace to scrape metrics from app pods on port 8080:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-monitoring-ingress
spec:
  description: "Allow monitoring namespace to access app metrics"
  rules:
    - ingress:
        - from:
            namespace:
              name: monitoring
        to:
          - port:
              protocol: TCP
              port: 8080

Tooling

  • cilium CLI: cilium connectivity check to validate policies, cilium network identity list for debugging.
  • Monitoring: Use Hubble (Cilium’s observability layer) to track denied traffic patterns.
  • Policy testing: Deploy policies in staging with kubectl apply --dry-run and validate with cilium label list.

Tradeoffs

  • Granularity vs. maintainability: Overly specific rules (e.g., per-pod policies) increase cognitive load and drift risk.
  • Default-deny egress: While ideal for security, it breaks many services (e.g., public package registries). Use egress: {} cautiously.
  • Label hygiene: Policies relying on labels require strict label governance; mislabeled pods become invisible to policy enforcement.

Troubleshooting

  • Symptom: Workload cannot receive traffic despite policy.
    • Check: Namespace labels match policy from/to rules.
    • Check: Port/protocol in policy matches service definition.
  • Symptom: Unintended traffic allowed.
    • Audit: Look for overlapping policies or broader rules in parent namespaces.
    • Use cilium connectivity trace to map traffic paths.
  • Symptom: Policy not applied.
    • Verify: Cilium version supports the policy API version.
    • Check: No conflicting CiliumClusterwideNetworkPolicy rules.

Granularity is a means, not an end. Prioritize policies that mitigate real risks (e.g., lateral movement, data exfiltration) over theoretical edge cases. Revisit policies quarterly or after major architecture changes.

Source thread: How granular should Cilium network policies be in production?

comments powered by Disqus