Model Distribution in Kubernetes: Practical Approaches and Pitfalls

We use a mix of OCI registries, HTTP servers.

JR

2 minute read

We use a mix of OCI registries, HTTP servers, and shared storage with workflow automation to distribute LLM models efficiently in production Kubernetes clusters.

Why This Matters

LLM models are large, versioned artifacts that require reliable, version-controlled distribution across clusters. Poor handling leads to startup delays, version mismatches, and wasted resources.

Actionable Workflow

  1. Model Storage

    • Store models as OCI artifacts in Harbor (with containerd workaround) or via HTTP server.
    • For lab setups: Use Argo Jobs to trigger model pulls into PersistentVolumes (PVs) before pod startup.
  2. Model Serving

    • Mount models via NFS or CSI drivers for shared access.
    • Use init containers to validate model checksums and permissions before starting the main workload.
  3. Version Control

    • Tag models with semantic versions (e.g., llm-model:v1.2.3).
    • Enforce version pinning in Kubernetes manifests.
  4. Cleanup Policy

    • Automate old model deletion via Harbor retention policies or cron jobs.
    • Monitor disk usage with df -h and alerts for storage exhaustion.

Policy Example

# Harbor model retention policy  
retention:  
  rule:  
    - prefix: "llm-models/"  
      targets:  
        - mediaType: application/vnd.oci.image.manifest.v1+json  
      actions:  
        - deleteUntagged: true  
        - keepNum: 5  

Tooling

  • Harbor + Dragonfly: Efficient for artifact distribution but requires models to be stored as fake images due to containerd’s OCI limitation (issue 11381).
  • Argo Workflows: Automate model preloading into PVs.
  • NFS/CephFS: Simple shared storage for multi-node access.
  • Velero: Backup/restore models stored in PVs.

Tradeoffs

  • Harbor Workaround: Saving models as images adds complexity (e.g., fake Dockerfile builds) but leverages existing registry infrastructure.
  • NFS Performance: Easy to set up but scales poorly for very large models (>100GB) under heavy concurrent access.
  • HTTP Server Simplicity: Low overhead but lacks built-in versioning and access controls.

Troubleshooting

  • Image Pull Errors:
    • Check if containerd is configured to pull from Harbor (containerd.config.toml -> [plugins]).
    • Verify models are pushed as images (not raw files) due to containerd’s OCI limitation.
  • Model Not Found:
    • Validate PV/PVC binding with kubectl describe pvc.
    • Check NFS export permissions (showmount -e <nfs-server>).
  • Argo Job Failures:
    • Inspect Argo event triggers (argo triggers get events).
    • Ensure init container has correct image path and permissions.

Final Notes

There’s no one-size-fits-all. For production, prioritize versioning and automation over “clever” storage hacks. If you’re on OpenShift, leverage its built-in registry and storage classes for tighter integration.

Source thread: How are you handling LLM model distribution in Kubernetes clusters?

comments powered by Disqus