External Secrets Operator: Reconciliation and Auth in Production

The External Secrets Operator simplifies secret management but requires careful handling of reconciliation and authentication.

JR

3 minute read

The External Secrets Operator simplifies secret management but requires careful handling of reconciliation and authentication tradeoffs to avoid security and stability issues in production.

Why This Matters

Kubernetes Secrets stored in etcd are often a liability for enterprises due to lack of encryption-at-rest guarantees and broad access risks. External Secrets Operator (ESO) addresses this by syncing secrets from external providers (e.g., AWS Secrets Manager, HashiCorp Vault) while keeping them out of etcd. However, reconciliation logic and authentication mechanisms introduce operational complexity.

Actionable Workflow

  1. Choose an External Provider
    Select a secrets store that aligns with your compliance and operational requirements (e.g., AWS Secrets Manager for AWS-native setups, Vault for multi-cloud).

  2. Deploy ESO
    Install via OperatorHub or Helm:

    oc new-project external-secrets-operator --display-name="External Secrets Operator"  
    opendaylight operator create --subscription-channel=stable --catalog-source=openshift-marketplace  
    
  3. Configure Secret Provider
    Define a SecretProviderClass (SPC) to specify the external store and authentication method:

    apiVersion: external-secrets.io/v1beta1  
    kind: SecretProviderClass  
    metadata:  
      name: aws-sm  
    spec:  
      provider:  
        aws:  
          secretSmEndpoint: https://secretsmanager.region.amazonaws.com  
          authType: AWSCredentials  
          roleArn: arn:aws:iam::123456789012:role/external-secrets-role  
    
  4. Define External Secrets
    Create ExternalSecret resources to map external secrets to Kubernetes:

    apiVersion: external-secrets.io/v1beta1  
    kind: ExternalSecret  
    metadata:  
      name: db-credentials  
    spec:  
      refreshInterval: 1h  
      secretStoreRef:  
        name: aws-sm  
      target:  
        name: db-secret  
      data:  
      - secretKey: password  
        remoteRef:  
          key: db-secret  
          property: password  
    
  5. Monitor Reconciliation
    Check ESO logs and use Prometheus metrics (external_secrets_reconciled, external_secrets_failures) to track sync health.

  6. Handle Failures
    If a secret fails to sync, inspect the ESO controller logs:

    kubectl logs -n external-secrets-operator <eso-pod-name> | grep -i "db-credentials"  
    

Policy Example: Secure SecretProviderClass

Enforce TLS and validation for all SPCs:

apiVersion: constraints.gatekeeper.sh/v1beta1  
kind: ConstraintTemplate  
metadata:  
  name: validatesspctls  
spec:  
  crd:  
    spec:  
      names:  
        kind: ValidatedSecretProviderClass  
  targets:  
    - target: admission  
      rego: |  
        package validatedsspctls  
        violation[{"msg": msg}] if  
          not input.review.object.spec.provider.aws.tls  
          msg := "AWS provider must enable TLS"  

Tooling

  • External Secrets Operator: Core component for syncing secrets.
  • cert-manager: Automate TLS certificate issuance for ESO webhook endpoints.
  • Prometheus/Grafana: Monitor reconciliation metrics and failure rates.
  • Open Policy Agent (OPA): Enforce policies on SecretProviderClass resources.

Tradeoffs

  • Reconciliation Frequency: Short intervals (e.g., 1m) improve responsiveness but increase API load. Default 1h balances stability and freshness.
  • Authentication Methods:
    • Service Account Tokens: Simple but require careful RBAC scoping.
    • IAM Roles (AWS): More secure but add complexity with role assumption and permissions boundaries.

Troubleshooting Common Issues

  1. Auth Failures

    • Verify IAM roles or service account permissions.
    • Check cloud provider IAM policy simulation tools.
  2. Missing Secrets

    • Confirm ExternalSecret references the correct SecretProviderClass.
    • Use kubectl describe externalsecret <name> to check sync status.
  3. Sync Delays

    • Review refreshInterval in ExternalSecret.
    • Check ESO controller logs for rate-limiting or API throttling errors.
  4. OpenShift-Specific Issues

    • Ensure the external-secrets-operator has the correct ServiceAccount permissions in the project.
    • Use oc adm policy add-scc-to-user privileged -z default if running into container image pull issues.

Final Notes

ESO reduces etcd exposure but shifts complexity to reconciliation and auth. Prioritize least-privilege authentication, monitor sync health, and test failure scenarios (e.g., external store downtime) to avoid outages. Always validate policies in a staging environment before enforcing them in production.

Source thread: External Secrets Operator in production — reconciliation + auth tradeoffs?

comments powered by Disqus