Nixos as Kubernetes Node Os: Tradeoffs and Workflow

NixOS can work as a Kubernetes node OS for specific use cases but requires careful management of immutability.

JR

2 minute read

NixOS can work as a Kubernetes node OS for specific use cases but requires careful management of immutability, hardware diversity, and cluster orchestration.

Practical Context

Kubernetes nodes typically demand stability, predictable updates, and hardware-agnostic provisioning. NixOS offers immutable infrastructure via declarative configs but introduces friction in dynamic environments. Use cases like homelabs, edge deployments, or GPU-heavy workloads (e.g., ML clusters) may justify its complexity.

Actionable Workflow

  1. Bootstrap Node
    Use nixos-anywhere or deploy-rs to provision base OS:
    nix run github:matthewbrowder/deploy-rs -- deploy --config ./cluster.nix  
    
  2. Configure Kubernetes Integration
    Enable kubelet and cloud-provider (if needed) in configuration.nix:
    services.kubelet.enable = true;  
    services.kubelet.extraConfig = {  
      clusterDNS = ["1.2.3.4"];  
      containerRuntime = "containerd";  
    };  
    
  3. Rebuild and Test
    Apply changes:
    sudo nixos-rebuild switch --flake ./cluster#node  
    

    Validate node status:

    kubectl get nodes --show-labels  
    

Policy Example

Enforce node-specific configs via Nix modules:

{ config, pkgs, ... }:  
{  
  resources.cpu.architectures = [ "x86_64-linux" "aarch64-linux" ];  
  services.containerd = {  
    config = "''${pkgs.writeText "config.toml" ''  
      [plugins."io.containerd.snapshotter.v1.runc"].options = {  
        "skip_log_setup" = "true"  
      }  
    ''}''";  
  };  
}  

Tooling

  • deploy-rs: Declarative cluster provisioning across heterogeneous hardware.
  • nixops: Manages cloud/physical nodes but struggles with dynamic scaling.
  • nixos-hardware: Prebuilt modules for common edge devices (e.g., Raspberry Pi).

Tradeoffs

  • Immutability vs. Dynamic Needs: NixOS’s atomic updates simplify rollbacks but complicate live patching (e.g., kernel updates require full node reboot).
  • Hardware Diversity: ARM/x86 mixed clusters work but demand custom kernel modules (e.g., Raspberry Pi + Cilium requires patched kernels).
  • Learning Curve: Nix language and flake system add overhead compared to Ansible/CIS benchmarks.

Troubleshooting

  • Kernel Module Issues:
    • Symptom: Cilium/ROOK fails due to missing modules.
    • Fix: Patch kernel in configuration.nix:
      boot.kernelPackages = pkgs.kernelPackages_linuxPackages_latest;  
      
  • Storage Provisioning:
    • Symptom: PVCs stuck in Pending.
    • Check: Ensure storage class matches provisioner (e.g., rook-ceph-block vs longhorn).
  • Networking:
    • Symptom: Nodes NotReady after reboot.
    • Check: journalctl -u kubelet for IP conflicts or CNI plugin misconfigurations.

Conclusion

NixOS nodes work best in controlled, heterogeneous environments where reproducibility outweighs dynamic scaling needs. For production, pair with lightweight runtimes (k3s, k3d) and accept operational complexity as a tax for declarative infrastructure. Avoid if your team lacks Nix expertise or requires seamless autoscaling.

Source thread: NixOS as OS for Node?

comments powered by Disqus