#Whisper best practice…

1 messages · Page 1 of 1 (latest)

wary plaza
#

Hi folks, I had been using whisper integration for STT running on HAOS, but decided to move to whisper.cpp with ML on Mac, to see if it would improve response accuracy and reduce processing time.

I would say that it has, by quite a bit.

However, it transcribes everything. I can’t have any other sounds in the room. TV? must be off. radio? Forget it! Children? silenced!!

Needless to say it has not been practical. Anyone else had any experience or advice?

fleet zinc
#

Which model exactly do you use for whisper?

I used it for a long while and it seemed to only pickup the loudest voice properly. (I used Whisper-large-v3 ) But if the TV voices were as loud as mine, it would pick me up and then drift to the TV content until they stopped talking.

Nowadays I run https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2, and it's way speedier as fas as answering goes, but not not as good at mixing languages. Per instance asking 'Can you play the movie ' La guerre des etoiles'' compared to Whisper-large-v3 in my experience

And not more problematic than Whisper at picking up voices. As bad if not better i'd say. But I did not test it as extensively.

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

wary plaza
#

Thanks for the reply. It took me ages to figure out how I’d built this… any I had been running Medium and have moved to large v3. I’ve also downgraded to qwen3.5:4b and things are a bit better.

Will need to check my system to see if there’s been swap happening. HAOS VM, Claude, and docker just seem to have ever expanding RAM usage. A wee restart every few weeks brings things back down.

I’ll look into parakeet.

#

Mines is also dual lingual and medium was causing German to fail. So that solved an issue for me. Thanks!

I am excited for Ollama to unveil more ML models. More speed!!

fleet zinc
#

If you use more than one assistant, you might want to dump Ollama for llama.cpp

#

It allows yuou to have multiple prompt cached, leading to faster response time

#

Since it's not reloading the cache of the prompt everytime you change assistant

wary plaza
#

Yeah I found that rather annoying, I actually used the open AI conversation agent integration instead. I have 4 separate conversation agents with different functions and prompts cached Works well!