CUDA for TTS on NixOS Without the Pain

cudaSupport = true

A short string that'll cost you a day.

Set that flag in nixpkgs and you're rebuilding PyTorch, TensorFlow, and half the ML ecosystem from source. CPU fans screaming, spouse asking what's wrong with the computer, existential dread settling in.

I just wanted a TTS audiobook pipeline. Not a part-time job babysitting gcc.

My Naive Approach

# (don't do this)
pkgs = import nixpkgs {
  config = {
    allowUnfree = true;
    cudaSupport = true;  # congratulations!! you're now building pytorch
  };
};

You'll get a milliion derivations. Multi-hour (day) build.
Yeah, no thanks!

The Smarter Path

Should be no surprise when it turns out there's a nix-community Hydra that builds CUDA packages and caches them. The trick is pinning your flake to the same nixpkgs branch they use.

As of mid 2025, they build cuda-stable against nixos-25.05-small. Not the full nixos-25.05, and wisely not nixos-unstable—specifically the -small variant. This matters because the cache only has hits if your nixpkgs input matches what Hydra built against. Pin to the wrong branch and you're back to compiling PyTorch while questioning your life choices.

# flake.nix — the right way
{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.05-small";
  };

  nixConfig = {
    extra-substituters = [ "https://nix-community.cachix.org" ];
    extra-trusted-public-keys = [
      "nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs="
    ];
  };

  outputs = { nixpkgs, ... }:
    let
      system = "x86_64-linux";
      pkgs = import nixpkgs {
        inherit system;
        config = {
          allowUnfree = true;
          cudaSupport = true;
        };
      };
    in {
      devShells.${system}.default = pkgs.mkShell {
        packages = with pkgs; [
          yt-dlp
          ffmpeg-full
          (whisper-cpp.override { cudaSupport = true; })
          tts
          sox
          jq
        ];

        shellHook = ''
          export CUDA_PATH=${pkgs.cudaPackages.cudatoolkit}
          export LD_LIBRARY_PATH=${pkgs.cudaPackages.cudatoolkit}/lib:${pkgs.cudaPackages.cudnn}/lib:$LD_LIBRARY_PATH
          export WHISPER_MODELS="$HOME/.cache/whisper-models"
          mkdir -p "$WHISPER_MODELS"
        '';
      };
    };
}

Why This Works

The nixConfig block tells Nix to check the nix-community cache before trying to build anything. When you enable cudaSupport = true, Nix looks for the derivation hash in the cache. If it finds a match—and it will, because you're pinned to the same branch Hydra uses—it downloads the binary instead of compiling.

The whisper-cpp.override is necessary because the default package doesn't include CUDA support. Without it, you get CPU inference, which works but means your GPU sits idle while your CPU fans scream for mercy.

The shellHook exports are the glue that makes CUDA actually work at runtime. PyTorch and friends need to find the CUDA libraries, and NixOS doesn't put them in standard paths like /usr/lib. These exports tell the ML stack where to look.

The Result

$ nix build .#devShells.x86_64-linux.default --dry-run

these 3 derivations will be built:
  whisper-cpp-1.7.5.drv
  cuda-merged-12.8.drv
  audiobook-pipeline.drv

these 788 paths will be fetched (6912.45 MiB download, 19319.41 MiB unpacked)

3 builds instead of 80+. Around 7GB download instead of hours of compilation. I can live with that.

Adding the Cache to Your System Config

Add the cache to your NixOS configuration so you don't get prompted every time you enter the dev shell:

# configuration.nix or wherever you keep your nix settings
nix.settings = {
  substituters = [
    "https://cache.nixos.org"
    "https://nix-community.cachix.org"
  ];
  trusted-public-keys = [
    "cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY="
    "nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs="
  ];
};

This way the cache is trusted system-wide and you won't see the "do you want to trust this substituter?" prompt every damn time.

Other Methods For Skinning This Cat

Docker: Yeah it works, but then you're doing Docker things instead of Nix things. Defeats the purpose. You wanted reproducibility, not "works on my container."

Pip venv hybrid: You could be a chump and use Nix for tooling, pip for the ML stuff with pre-built wheels. Fast, but now you've got a bloaty .venv directory and non-reproducible builds. Disgusting.

Flox: Commercial thing that legally redistributes CUDA binaries. Probably fine, haven't (and won't) try it. Wish them the best with that tho.

Global cudaSupport with prayers: Doesn't work, God is dead, and all your prayers remain unanswered.

Caveats

The nix-community cache has regular garbage collection, so don't pin to some ancient commit and expect cache hits. Stay reasonably current with your pin. Check the Hydra jobset to see what's actually being built and cached.

Also, whisper-cpp needs model files downloaded separately. Grab them from HuggingFace:

mkdir -p ~/.cache/whisper-models
curl -L -o ~/.cache/whisper-models/ggml-base.en.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

The base.en model is good enough for most English transcription. If you need better accuracy, grab medium.en or large-v3, but they're bigger downloads and slower inference.

The Actual Pipeline

Once you're in the dev shell, the audiobook pipeline is straightforward:

# Download audio from wherever
yt-dlp -x --audio-format wav -o "input.wav" "https://example.com/video"

# Transcribe with whisper
whisper-cpp -m ~/.cache/whisper-models/ggml-base.en.bin -f input.wav -otxt

# Generate TTS (coqui-tts)
tts --text "$(cat input.txt)" --out_path output.wav

The GPU acceleration makes whisper-cpp actually usable. CPU inference on a long-form audio file is painful. With CUDA, it's... less painful.