#Intel GPU Accelerated Speech to Text in Docker

1 messages · Page 1 of 1 (latest)

steady knot
#

Like many people I have a home server with the cheapo Intel Arc A380 for Jellyfin transcoding that otherwise does nothing, so I whipped up a docker compose to run GPU-accelerated speech-to-text using whisper.cpp.
Initial request will take some time but after that, on my A380, short requests in English like "Turn off kitchen lights" get processed in ~1 second using the large-v2 Whisper model.
speech-to-phrase is typically better (although it depends on audio quality) if you are using only the default conversation agent, but this could be useful when paired together with LLMs, especially local ones in Prefer handling commands locally mode.
I imagine something like Arc B580 should be able to run this and a model like llama3.1 or qwen2.5 at the same time (using the ipex image).
https://github.com/tannisroot/wyoming-whisper-cpp-intel-gpu-docker

GitHub

Run an Intel GPU-accelerated Wyoming protocol speech-to-text service for your Home Asssistant in Docker - tannisroot/wyoming-whisper-cpp-intel-gpu-docker