linuxserver/faster-whisper on docker Desktop | Home Assistant | Page 1

balmy chasm Apr 27, 2025, 6:28 PM

#

I put ha on proxmox and now i want to offload whisper to the desk top. im guessing the only way is docker desktop atm. what i'm curious about as i have little to almost zero experience with docker but am i under the assumption that using this container would i need to do the nvidia toolkit etc to get this to work? like can i put this in a container and be fine?

#

would it be better to run linux in a vm on windows instead? open to thoughts as i kind of have a grasp but not by much lol

balmy chasm Apr 27, 2025, 7:56 PM

#

volumes: - /path/to/faster-whisper/data:/config would i point that to where i want whisper to be installed?

balmy chasm Apr 27, 2025, 8:38 PM

#

im just lost what to put there for the path

balmy chasm Apr 27, 2025, 9:18 PM

#

ok i tried searching for the linux faster-whisper and ran it.. only to get these errors constantly

#

balmy chasm Apr 27, 2025, 10:51 PM

#

after pulling the linux image for the faster whisper ive got these options. any idea what i'd do here?

humble python Apr 27, 2025, 11:31 PM

#

You should use compose. You also need to use the gpu tag and follow this: https://docs.docker.com/desktop/features/gpu/

balmy chasm Apr 28, 2025, 12:49 AM

#

i got whisper working in docker and have pointed ha to it. it works but definately need to figure out how to pass gpu in

#

do i need Nvidia Container Toolkit?

humble python Apr 28, 2025, 1:04 AM

#

I told you all that you need.

balmy chasm Apr 28, 2025, 1:31 AM

#

i have the latest gameready driver yet it fails when installing the compose

balmy chasm Apr 28, 2025, 2:21 AM

#

can anyone see here why its given that error?

#

📎 message.txt

humble python Apr 28, 2025, 11:57 AM

#

balmy chasm i have the latest gameready driver yet it fails when installing the compose

Please share the compose file you use.

balmy chasm Apr 28, 2025, 12:03 PM

#

--- services: faster-whisper: image: lscr.io/linuxserver/faster-whisper:gpu container_name: faster-whisper environment: - PUID=1000 - PGID=1000 - TZ=Etc/UTC - WHISPER_MODEL=tiny-int8 - WHISPER_BEAM=1 #optional - WHISPER_LANG=en #optional volumes: - C:\Users\Viper\faster-whisper\data:/config ports: - 10300:10300 restart: unless-stopped

humble python Apr 28, 2025, 3:25 PM

#

The nvidia/gpu section is missing.

steel patrol Apr 28, 2025, 3:33 PM

#

Can add the following to the environment vars:
That should make it use the nvidia container toolkit

humble python Apr 28, 2025, 3:42 PM

#

https://docs.docker.com/compose/how-tos/gpu-support/

Docker Documentation

Enable GPU support

Understand GPU support in Docker Compose

steel patrol Apr 28, 2025, 3:43 PM

#

Yeah just bear in mind if you set the GPU reservation in compose, it takes the entire GPU blocking other pods from utilizing it 🙂

humble python Apr 28, 2025, 3:43 PM

#

Not my experience.

steel patrol Apr 28, 2025, 3:44 PM

#

might be a quirk of pure docker maybe. I'm using k8s so it's probably more strict 😄

humble python Apr 28, 2025, 3:47 PM

#

I'm using the same GPU like this for whisper, ollama and piper. Just normal docker and compose on linux.

balmy chasm Apr 28, 2025, 3:57 PM

#

Ok ill try it when i get home. Info is so scattered about this. Read countless sites and you get bits and pieces of the puzzle. For someone not experienced in docker its a pain wrapping your head around all the little details and how to enable them

#

Is that linux compose still a good one for this or is it out of date?

humble python Apr 28, 2025, 4:11 PM

#

that linux compose
What are you referring to specifically?

balmy chasm Apr 28, 2025, 5:34 PM

#

I mean the image from that linux server. -dumb windows guy-

humble python Apr 28, 2025, 6:21 PM

#

Sure.

balmy chasm Apr 28, 2025, 10:20 PM

#

📎 message.txt

#

so i run it now and i get this and it restarts

#

services:
faster-whisper:
image: lscr.io/linuxserver/faster-whisper:gpu
container_name: wyoming-whisper
environment:
- PUID=1000
- PGID=1000
- TZ=Etc/UTC
- WHISPER_MODEL=large-v3
- WHISPER_BEAM=1 #optional
- WHISPER_LANG=en #optional
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all
command: nvidia-smi
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
ports:
- 10300:10300
volumes:
- C:\Users\Viper\faster-whisper\data:/config
restart: unless-stopped

#

thats my compose

balmy chasm Apr 28, 2025, 10:40 PM

#

WAIT

#

HOLY

#

SHIT I GOT IT WORKING

#

2025-04-28 18:37:36.296 | [ls.io-init] done.
2025-04-28 18:40:08.884 | INFO:faster_whisper:Processing audio with duration 00:04.330
2025-04-28 18:40:09.923 | INFO:wyoming_faster_whisper.handler: Close the blinds.
2025-04-28 18:41:54.479 | INFO:faster_whisper:Processing audio with duration 00:03.620
2025-04-28 18:41:55.024 | INFO:wyoming_faster_whisper.handler: What time is it?

#

thats large v3 so i already know its gpu just because of the response time

#

wonder if i should switch to turbo or leave it here.

#

now to put piper on there

#

services: faster-whisper: image: lscr.io/linuxserver/faster-whisper:gpu container_name: wyoming-whisper environment: - PUID=1000 - PGID=1000 - TZ=Etc/UTC - WHISPER_MODEL=large-v3 - WHISPER_BEAM=1 #optional - WHISPER_LANG=en #optional - NVIDIA_VISIBLE_DEVICES=all - NVIDIA_DRIVER_CAPABILITIES=all ports: - 10300:10300 volumes: - C:\Users\Viper\faster-whisper\data:/config restart: unless-stopped runtime: nvidia deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: - gpu - utility - compute

#

would a piper compose look roughly the same for gpu?

humble python Apr 28, 2025, 10:50 PM

#

Yep.

balmy chasm Apr 28, 2025, 10:53 PM

#

sorry for nagging you guys but docker has been a real pain to get. first time using it

humble python Apr 28, 2025, 10:56 PM

#

Kind of like that with everything in technology. Everything is connected to other things, some more complicated than others. Docker involves storage, networking, containers, in your case a VM and desktop, GPU, drivers, toolkits, etc.
TLDR: I get it.

balmy chasm Apr 28, 2025, 10:58 PM

#

thinking i'll keep ollama without control and just use it for tts automations so it doesn't feel boring and repetitive. not sinking a mountain of money in making it run nearly as quick.

#

yea it is. had to learn linux,docker and proxmomx for what i wanted to do. been a pain because it's easy to overload yourself with info and get scatter brained

humble python Apr 28, 2025, 10:59 PM

#

The HA "LLM" alone is too strict for my liking.

balmy chasm Apr 28, 2025, 11:00 PM

#

for me i don't have enough knowledge with llm to get it working the way i want. it was hit and miss and hallucinations.

#

a guy at work has one set up and he literally runs many cards for it in a server just to learn it. he's the IT guy and i suspect he's getting the cards from work for free. shrug..not making waves

#

only thing i'm not sure about is how would i set up a custom voice with piper in docker? in ha it's easy

balmy chasm Apr 29, 2025, 12:34 AM

#

yea ha doesnt see custom voices in the data file for what ever reason

#

i take that back it sees it. it just doesnt play the file

#

yea it pulls from hugging face i guess when a new voice is added and if it doesnt find it in the piper repo it errors out. damn that sucks

humble python Apr 29, 2025, 12:47 AM

#

Which image do you use?

balmy chasm Apr 29, 2025, 12:47 AM

#

https://github.com/slackr31337/wyoming-piper-gpu/pkgs/container/wyoming-piper-gpu using that

GitHub

Build software better, together

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

humble python Apr 29, 2025, 12:47 AM

#

Set it via the PIPER_VOICE variable. It should download the language during start.
See languages here: https://rhasspy.github.io/piper-samples/

balmy chasm Apr 29, 2025, 12:48 AM

#

well im adding a custom voice i found from hugging face. tried putting it in the data folder with the rest of the voices

#

i had set the voice in the compose but i think it looks on hugging face for that voice instead of the data folder

#

unless im missing something i guess the only way to do custom voices is via the addon and share folder

#

but that would mean running piper on ha

humble python Apr 29, 2025, 12:57 AM

#

https://github.com/slackr31337/wyoming-piper-gpu/issues/24

balmy chasm Apr 29, 2025, 1:31 AM

#

i'll look in to it tmrw, but honestly not sure if it matters putting piper on the gpu. with whisper on it, it's still pretty damn quick

humble python Apr 29, 2025, 5:34 PM

#

Realistically not, no. I do it because I can.

viscid kindle Apr 29, 2025, 8:31 PM

#

Yeah, Piper is fast

glossy oracle Apr 29, 2025, 9:02 PM

#

If you have room left in your GPU's VRAM, kokoro TTS sounds a lot better than piper IMHO.

viscid kindle Apr 29, 2025, 9:10 PM

#

glossy oracle If you have room left in your GPU's VRAM, kokoro TTS sounds a lot better than pi...

Nevertheless Kokoro has pretty funky ways to be installed in Docker...

balmy chasm Apr 29, 2025, 11:56 PM

#

I have a bit off topic question, i only want to use llm right now for just re phrasing text. for instance i have it where it lets me know when someone is on the camera but it changes it up each time, but for this i don't think qwen2.5 is it as thats a 7b model and a bit big for basically rephrasing things. Whats a more suitable model for this. i have no local control atm enabled and i have the prompt where it basically keeps things simple

balmy chasm Apr 30, 2025, 12:43 AM

#

glossy oracle If you have room left in your GPU's VRAM, kokoro TTS sounds a lot better than pi...

With ollama windows server and faster whisper large-v3 turbo i'm only utilizing 6gb of 12gb in vram

#

which makes me want to ask. since apparently everything windows sucks, would ollama in docker run any quicker vs the windows app

#

speaking of large turbo vs large-v3 makes a significant difference in speed and the lack of accuracy i have yet to notice

#

is kokoro any slower, in theory i'd have 6gb to throw at it but if there is a 2 second difference i'd stick with piper. i tried some of the voices out on their site and they are better.

glossy oracle Apr 30, 2025, 1:24 AM

#

For memory intensive things like TTS and LLMs, I prefer Linux server (Ubuntu). It doesn't have a GUI and uses substantially less resources, leaving more for the fun stuff. Kokoro is instant for me, even with long phrases.

balmy chasm Apr 30, 2025, 2:29 AM

#

What gpu do you use?

glossy oracle Apr 30, 2025, 3:06 PM

#

I use two in my "AI Box". Both are 3060s with 12GB VRAM. One runs the LLMS and the other runs Whisper and Kokoro.

balmy chasm Apr 30, 2025, 9:20 PM

#

is there a way to speed ollama up. i'm not giving it control and the context is 8192. using llama3.2 but for instance for 12 words more or less to be said it runs 5-6 seconds before the response. gpu only spikes to 60%

humble python Apr 30, 2025, 9:40 PM

#

Hard to say without seeing logs, tokens/s, full usage statistics, and so on. What does ollama ps say?

balmy chasm Apr 30, 2025, 10:19 PM

#

still learning but my prompt is stupid simple and i know i could be doing something wrong

#

`actions:

action: conversation.process
metadata: {}
data:
agent_id: conversation.llama3_2
text: >-
Rephrase the following text and impersonate jarvis from iron man in your
response: notify the owner that a person has been spotted on the back
deck security camera.
response_variable: response
action: assist_satellite.announce
metadata: {}
data:
message: "'{{ response.response.speech.plain.speech }}'"
preannounce: true
target:
entity_id: assist_satellite.home_assistant_voice_095f33_assist_satellite
mode: single`

#

simple automation. took 6 seconds to play

#

viscid kindle Apr 30, 2025, 10:32 PM

#

That looks okay.

#

Can you run model locally with --verbose and ask it to write 100-word poem or something?

balmy chasm Apr 30, 2025, 10:38 PM

#

`Softly falls the evening dew,
A gentle hush, a calm anew.
The stars appear, like diamonds bright,
A celestial show, in all its light.

The world is still, in quiet sleep,
Dreams dance upon, the darkness deep.
The moon's pale glow, illuminates the night,
A silver path, where shadows take flight.

In this serene and peaceful place,
I find my heart, a calm and peaceful space.
Where worries fade, and love resides,
And in the stillness, I am free to glide.

total duration: 2.9474056s
load duration: 1.6026694s
prompt eval count: 31 token(s)
prompt eval duration: 93.5146ms
prompt eval rate: 331.50 tokens/s
eval count: 110 token(s)
eval duration: 1.2502285s
eval rate: 87.98 tokens/s`

#

well i'm not even sure it's ollama. i was just guessing. so here is what i have going via yaml i have havpe play out to an external media player "sonos one" could it be that handshake there thats causing the delay?

#

im using the tts uri method from another post in here

#

i'm also using a custom jarvis voice "high" from hugging face

#

and piper is being run in ha not on gpu

#

my assumption was the model itself might be too big for what i'm using it for?

#

llama3.2 seems quicker then qwin2.5 for sure

#

tried walking in front of a camera..counted to 10 and only ever got the announce sound from the speaker. it seems hit and miss. not seeing anything in the logs about it. sound like it's timing out

glossy oracle Apr 30, 2025, 11:03 PM

#

If you go into settings / voice assistants, choose debug from the three dot menu by the assistant. It will show you exactly how long each step is taking (speech to text, natural language processing, and text to speech). Then you can see if something else is slowing you down. Maybe ollama's snappy at 2 - 3 seconds and tts is adding 3 more seconds or something like that.

balmy chasm Apr 30, 2025, 11:07 PM

#

thing is, the debug shows the last time i spoke to voice assistant. it doesn't show when i run in in a conversation.process in automation

#

and i use home assistant as the conversation agent for speaking to assist. llm was messing things up. but when i use it in automations, i point them to the ollama integration and the voice model i use

#

but here is one for when i opened the blinds. it took ha longer to act then faster whisper

#

#

but i get it faster whisper has noting to do with my issue since it's the llm im using there

#

there just seems to be a bottle neck somewhere in between. not sure if it's maybe the sonos api

viscid kindle Apr 30, 2025, 11:54 PM

#

In your screenshot 2.68 s was processed locally, so it's not LLM, it's Hassil.

balmy chasm May 1, 2025, 12:04 AM

#

that was just referencing local assist when spoken to. i dont know of a way to see how llm is processed when used in automations

#

right now when someone went on the deck it started responding then cut off after partially playing the audio

#

my context is still 8192

viscid kindle May 1, 2025, 12:54 AM

#

You can trace any automation too. Go to trace and check the timeline.

balmy chasm May 1, 2025, 1:00 AM

#

i traced the automation and it showed everything running

#

nothing errored out

#

restarted ollama and it's still cutting off the audio

#

plays like 2 seconds then stops

#

could it be the pe doing it?

viscid kindle May 1, 2025, 1:11 AM

#

You may click on TTS URL and hear what your TTS generated. Then it will be clear.

#

If it's PE, check if it's not underpowered. Use good power adapter.

balmy chasm May 1, 2025, 1:29 AM

#

tts files play all the way through

#

so the pe is cutting it off..

#

Error while executing automation automation.test_voice_pe: Error calling SonosMediaPlayerEntity._play_media on media_player.foyer_speaker: UPnP Error 714 received: Illegal MIME-Type from

#

just got this

#

wonder if its the sonos integration at the root

viscid kindle May 1, 2025, 1:50 AM

#

Sorry, I didn't read all the thread. Do you play TTS on PE or Sonos?

balmy chasm May 1, 2025, 1:51 AM

#

sonos. pe kinda sucks for speaker

#

almost like the type of file passing on to sonos it doesnt like

viscid kindle May 1, 2025, 1:56 AM

#

Well then it's definitely not PE to blame, right? 🙂
Looks like yes, Sonos doesn't like MIME.

balmy chasm May 7, 2025, 1:40 AM

#

got bored and gave it another shot and finally got the custom voice to work in piper container. had some initial dl error in the log but the voice works fine and verified via resources it calls to the gpu

#linuxserver/faster-whisper on docker Desktop