#Looking for advice on where to run my assist pipelines
1 messages · Page 1 of 1 (latest)
are you trying to run all local? IE STT/TTS entirely local? LLM as well potentially?
Run everything on your monster Mac. what's the point of using anything else?
If you want to offload something (why...), you can offload TTS - it's fast even on cheaper hardware.
I guess would be cool to move HA too, it'/s the weakest point here.
Yes, trying to run all local
I'd like to use Docker as much as possible, especially on the Mac. But docker on the Mac does not support GPU-passthrough. So I thought only installing Ollama on the Mac, and use Docker on the NAS/Pi for TTS/SST.
So if you want to run STT locally with decent performance and speed, you'd need a NVIDIA GPU, and would want to run the CUDA accelerated version of wyoming-faster-whisper. You can also run Piper this way as well.
so you could run everything else on the Mac, even Piper since it works just fine on a beefy CPU, but for whisper you'd need a CUDA GPU to get good response times with a reliable model (like large-v3 for instance).
M4 Pro can run LLMs faster than most of the GPUs, won't it run Whisper just fine?
Problem is there's no support for it as far as I am aware
Mac uses Metal, not CUDA.
unless someone has written a wyoming-faster-whisper that can make use of Metal
Oh right. Damn.