#Looking for advice on where to run my assist pipelines

1 messages · Page 1 of 1 (latest)

dire ocean
#

Hi, I own a Mac Mini M4 Pro with 48GB RAM, an Asustor NAS with an Intel (Quad-core Celeron) N5105 and 16GB (supports docker), and a Raspberry Pi 4 w/4GB running Home Assistant. I'd like to build a voice assistant with the fastest possible response time. What is the optimal configuration, i.e. what do I run where?

split jolt
#

are you trying to run all local? IE STT/TTS entirely local? LLM as well potentially?

upbeat mauve
#

Run everything on your monster Mac. what's the point of using anything else?
If you want to offload something (why...), you can offload TTS - it's fast even on cheaper hardware.
I guess would be cool to move HA too, it'/s the weakest point here.

dire ocean
#

Yes, trying to run all local

#

I'd like to use Docker as much as possible, especially on the Mac. But docker on the Mac does not support GPU-passthrough. So I thought only installing Ollama on the Mac, and use Docker on the NAS/Pi for TTS/SST.

split jolt
#

So if you want to run STT locally with decent performance and speed, you'd need a NVIDIA GPU, and would want to run the CUDA accelerated version of wyoming-faster-whisper. You can also run Piper this way as well.

#

so you could run everything else on the Mac, even Piper since it works just fine on a beefy CPU, but for whisper you'd need a CUDA GPU to get good response times with a reliable model (like large-v3 for instance).

upbeat mauve
split jolt
#

Problem is there's no support for it as far as I am aware

#

Mac uses Metal, not CUDA.

#

unless someone has written a wyoming-faster-whisper that can make use of Metal

upbeat mauve
#

Oh right. Damn.