#linuxserver/faster-whisper on docker Desktop

1 messages · Page 1 of 1 (latest)

balmy chasm
#

I put ha on proxmox and now i want to offload whisper to the desk top. im guessing the only way is docker desktop atm. what i'm curious about as i have little to almost zero experience with docker but am i under the assumption that using this container would i need to do the nvidia toolkit etc to get this to work? like can i put this in a container and be fine?

#

would it be better to run linux in a vm on windows instead? open to thoughts as i kind of have a grasp but not by much lol

balmy chasm
#

volumes: - /path/to/faster-whisper/data:/config would i point that to where i want whisper to be installed?

balmy chasm
#

im just lost what to put there for the path

balmy chasm
#

ok i tried searching for the linux faster-whisper and ran it.. only to get these errors constantly

balmy chasm
#

after pulling the linux image for the faster whisper ive got these options. any idea what i'd do here?

humble python
balmy chasm
#

i got whisper working in docker and have pointed ha to it. it works but definately need to figure out how to pass gpu in

#

do i need Nvidia Container Toolkit?

humble python
#

I told you all that you need.

balmy chasm
#

i have the latest gameready driver yet it fails when installing the compose

balmy chasm
#

can anyone see here why its given that error?

humble python
balmy chasm
#

--- services: faster-whisper: image: lscr.io/linuxserver/faster-whisper:gpu container_name: faster-whisper environment: - PUID=1000 - PGID=1000 - TZ=Etc/UTC - WHISPER_MODEL=tiny-int8 - WHISPER_BEAM=1 #optional - WHISPER_LANG=en #optional volumes: - C:\Users\Viper\faster-whisper\data:/config ports: - 10300:10300 restart: unless-stopped

humble python
#

The nvidia/gpu section is missing.

steel patrol
#

Can add the following to the environment vars:
That should make it use the nvidia container toolkit

humble python
steel patrol
#

Yeah just bear in mind if you set the GPU reservation in compose, it takes the entire GPU blocking other pods from utilizing it 🙂

humble python
#

Not my experience.

steel patrol
#

might be a quirk of pure docker maybe. I'm using k8s so it's probably more strict 😄

humble python
#

I'm using the same GPU like this for whisper, ollama and piper. Just normal docker and compose on linux.

balmy chasm
#

Ok ill try it when i get home. Info is so scattered about this. Read countless sites and you get bits and pieces of the puzzle. For someone not experienced in docker its a pain wrapping your head around all the little details and how to enable them

#

Is that linux compose still a good one for this or is it out of date?

humble python
#

that linux compose
What are you referring to specifically?

balmy chasm
#

I mean the image from that linux server. -dumb windows guy-

humble python
#

Sure.

balmy chasm
#

so i run it now and i get this and it restarts

#

services:
faster-whisper:
image: lscr.io/linuxserver/faster-whisper:gpu
container_name: wyoming-whisper
environment:
- PUID=1000
- PGID=1000
- TZ=Etc/UTC
- WHISPER_MODEL=large-v3
- WHISPER_BEAM=1 #optional
- WHISPER_LANG=en #optional
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all
command: nvidia-smi
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
ports:
- 10300:10300
volumes:
- C:\Users\Viper\faster-whisper\data:/config
restart: unless-stopped

#

thats my compose

balmy chasm
#

WAIT

#

HOLY

#

SHIT I GOT IT WORKING

#

2025-04-28 18:37:36.296 | [ls.io-init] done.
2025-04-28 18:40:08.884 | INFO:faster_whisper:Processing audio with duration 00:04.330
2025-04-28 18:40:09.923 | INFO:wyoming_faster_whisper.handler: Close the blinds.
2025-04-28 18:41:54.479 | INFO:faster_whisper:Processing audio with duration 00:03.620
2025-04-28 18:41:55.024 | INFO:wyoming_faster_whisper.handler: What time is it?

#

thats large v3 so i already know its gpu just because of the response time

#

wonder if i should switch to turbo or leave it here.

#

now to put piper on there

#

services: faster-whisper: image: lscr.io/linuxserver/faster-whisper:gpu container_name: wyoming-whisper environment: - PUID=1000 - PGID=1000 - TZ=Etc/UTC - WHISPER_MODEL=large-v3 - WHISPER_BEAM=1 #optional - WHISPER_LANG=en #optional - NVIDIA_VISIBLE_DEVICES=all - NVIDIA_DRIVER_CAPABILITIES=all ports: - 10300:10300 volumes: - C:\Users\Viper\faster-whisper\data:/config restart: unless-stopped runtime: nvidia deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: - gpu - utility - compute

#

would a piper compose look roughly the same for gpu?

humble python
#

Yep.

balmy chasm
#

sorry for nagging you guys but docker has been a real pain to get. first time using it

humble python
#

Kind of like that with everything in technology. Everything is connected to other things, some more complicated than others. Docker involves storage, networking, containers, in your case a VM and desktop, GPU, drivers, toolkits, etc.
TLDR: I get it.

balmy chasm
#

thinking i'll keep ollama without control and just use it for tts automations so it doesn't feel boring and repetitive. not sinking a mountain of money in making it run nearly as quick.

#

yea it is. had to learn linux,docker and proxmomx for what i wanted to do. been a pain because it's easy to overload yourself with info and get scatter brained

humble python
#

The HA "LLM" alone is too strict for my liking.

balmy chasm
#

for me i don't have enough knowledge with llm to get it working the way i want. it was hit and miss and hallucinations.

#

a guy at work has one set up and he literally runs many cards for it in a server just to learn it. he's the IT guy and i suspect he's getting the cards from work for free. shrug..not making waves

#

only thing i'm not sure about is how would i set up a custom voice with piper in docker? in ha it's easy

balmy chasm
#

yea ha doesnt see custom voices in the data file for what ever reason

#

i take that back it sees it. it just doesnt play the file

#

yea it pulls from hugging face i guess when a new voice is added and if it doesnt find it in the piper repo it errors out. damn that sucks

humble python
#

Which image do you use?

balmy chasm
humble python
balmy chasm
#

well im adding a custom voice i found from hugging face. tried putting it in the data folder with the rest of the voices

#

i had set the voice in the compose but i think it looks on hugging face for that voice instead of the data folder

#

unless im missing something i guess the only way to do custom voices is via the addon and share folder

#

but that would mean running piper on ha

humble python
balmy chasm
#

i'll look in to it tmrw, but honestly not sure if it matters putting piper on the gpu. with whisper on it, it's still pretty damn quick

humble python
#

Realistically not, no. I do it because I can.

viscid kindle
#

Yeah, Piper is fast

glossy oracle
#

If you have room left in your GPU's VRAM, kokoro TTS sounds a lot better than piper IMHO.

viscid kindle
balmy chasm
#

I have a bit off topic question, i only want to use llm right now for just re phrasing text. for instance i have it where it lets me know when someone is on the camera but it changes it up each time, but for this i don't think qwen2.5 is it as thats a 7b model and a bit big for basically rephrasing things. Whats a more suitable model for this. i have no local control atm enabled and i have the prompt where it basically keeps things simple

balmy chasm
#

which makes me want to ask. since apparently everything windows sucks, would ollama in docker run any quicker vs the windows app

#

speaking of large turbo vs large-v3 makes a significant difference in speed and the lack of accuracy i have yet to notice

#

is kokoro any slower, in theory i'd have 6gb to throw at it but if there is a 2 second difference i'd stick with piper. i tried some of the voices out on their site and they are better.

glossy oracle
#

For memory intensive things like TTS and LLMs, I prefer Linux server (Ubuntu). It doesn't have a GUI and uses substantially less resources, leaving more for the fun stuff. Kokoro is instant for me, even with long phrases.

balmy chasm
#

What gpu do you use?

glossy oracle
#

I use two in my "AI Box". Both are 3060s with 12GB VRAM. One runs the LLMS and the other runs Whisper and Kokoro.

balmy chasm
#

is there a way to speed ollama up. i'm not giving it control and the context is 8192. using llama3.2 but for instance for 12 words more or less to be said it runs 5-6 seconds before the response. gpu only spikes to 60%

humble python
#

Hard to say without seeing logs, tokens/s, full usage statistics, and so on. What does ollama ps say?

balmy chasm
#

still learning but my prompt is stupid simple and i know i could be doing something wrong

#

`actions:

  • action: conversation.process
    metadata: {}
    data:
    agent_id: conversation.llama3_2
    text: >-
    Rephrase the following text and impersonate jarvis from iron man in your
    response: notify the owner that a person has been spotted on the back
    deck security camera.
    response_variable: response
  • action: assist_satellite.announce
    metadata: {}
    data:
    message: "'{{ response.response.speech.plain.speech }}'"
    preannounce: true
    target:
    entity_id: assist_satellite.home_assistant_voice_095f33_assist_satellite
    mode: single`
#

simple automation. took 6 seconds to play

viscid kindle
#

That looks okay.

#

Can you run model locally with --verbose and ask it to write 100-word poem or something?

balmy chasm
#

`Softly falls the evening dew,
A gentle hush, a calm anew.
The stars appear, like diamonds bright,
A celestial show, in all its light.

The world is still, in quiet sleep,
Dreams dance upon, the darkness deep.
The moon's pale glow, illuminates the night,
A silver path, where shadows take flight.

In this serene and peaceful place,
I find my heart, a calm and peaceful space.
Where worries fade, and love resides,
And in the stillness, I am free to glide.

total duration: 2.9474056s
load duration: 1.6026694s
prompt eval count: 31 token(s)
prompt eval duration: 93.5146ms
prompt eval rate: 331.50 tokens/s
eval count: 110 token(s)
eval duration: 1.2502285s
eval rate: 87.98 tokens/s`

#

well i'm not even sure it's ollama. i was just guessing. so here is what i have going via yaml i have havpe play out to an external media player "sonos one" could it be that handshake there thats causing the delay?

#

im using the tts uri method from another post in here

#

i'm also using a custom jarvis voice "high" from hugging face

#

and piper is being run in ha not on gpu

#

my assumption was the model itself might be too big for what i'm using it for?

#

llama3.2 seems quicker then qwin2.5 for sure

#

tried walking in front of a camera..counted to 10 and only ever got the announce sound from the speaker. it seems hit and miss. not seeing anything in the logs about it. sound like it's timing out

glossy oracle
#

If you go into settings / voice assistants, choose debug from the three dot menu by the assistant. It will show you exactly how long each step is taking (speech to text, natural language processing, and text to speech). Then you can see if something else is slowing you down. Maybe ollama's snappy at 2 - 3 seconds and tts is adding 3 more seconds or something like that.

balmy chasm
#

thing is, the debug shows the last time i spoke to voice assistant. it doesn't show when i run in in a conversation.process in automation

#

and i use home assistant as the conversation agent for speaking to assist. llm was messing things up. but when i use it in automations, i point them to the ollama integration and the voice model i use

#

but here is one for when i opened the blinds. it took ha longer to act then faster whisper

#

but i get it faster whisper has noting to do with my issue since it's the llm im using there

#

there just seems to be a bottle neck somewhere in between. not sure if it's maybe the sonos api

viscid kindle
#

In your screenshot 2.68 s was processed locally, so it's not LLM, it's Hassil.

balmy chasm
#

that was just referencing local assist when spoken to. i dont know of a way to see how llm is processed when used in automations

#

right now when someone went on the deck it started responding then cut off after partially playing the audio

#

my context is still 8192

viscid kindle
#

You can trace any automation too. Go to trace and check the timeline.

balmy chasm
#

i traced the automation and it showed everything running

#

nothing errored out

#

restarted ollama and it's still cutting off the audio

#

plays like 2 seconds then stops

#

could it be the pe doing it?

viscid kindle
#

You may click on TTS URL and hear what your TTS generated. Then it will be clear.

#

If it's PE, check if it's not underpowered. Use good power adapter.

balmy chasm
#

tts files play all the way through

#

so the pe is cutting it off..

#

Error while executing automation automation.test_voice_pe: Error calling SonosMediaPlayerEntity._play_media on media_player.foyer_speaker: UPnP Error 714 received: Illegal MIME-Type from

#

just got this

#

wonder if its the sonos integration at the root

viscid kindle
#

Sorry, I didn't read all the thread. Do you play TTS on PE or Sonos?

balmy chasm
#

sonos. pe kinda sucks for speaker

#

almost like the type of file passing on to sonos it doesnt like

viscid kindle
#

Well then it's definitely not PE to blame, right? 🙂
Looks like yes, Sonos doesn't like MIME.

balmy chasm
#

got bored and gave it another shot and finally got the custom voice to work in piper container. had some initial dl error in the log but the voice works fine and verified via resources it calls to the gpu