#Voice Performance boost.

1 messages · Page 1 of 1 (latest)

loud bramble
#

Looking at how slow my voice is on dell optioned and just an old laptop running docker ollama im thinking of getting either a vps to run it or alternatively look at paying for open ai api … whats peoples thoughts on this ?

digital sedge
loud bramble
#

Oooh I do have cloud but I’ve not tried it yet

digital sedge
#

also another possibility to look at if you are running local (and have the extra ram available) then look at ONNX-ASR

#

if your running STT in english then its a bit quicker

#

however if your looking at speeding up the LLM "Brain" more than the recognition then perhaps a small ollama model or a system with a more powerful gpu

#

speeding up various parts of the pipeline can help overall speed. dont nessasarily have to focus on a specific part

#

which model are you running on ollama?

loud bramble
#

I was running llama3.2 lol which I used from my laptop server due to its GPU - but im looking at getting a dedicated machine for it soon.

digital sedge
loud bramble
#

Rodger - how resource intensive is it ? - debating running it on separate machine within proxmox

digital sedge
#

i don't have a huge amount of experience with the smaller model tbh

#

i use the 14b model but the vm that runs that has a 16gb 5060ti attached to it...

#

i have heard good things from others using it for assist though. and qwen3 is pretty quick in general

autumn sequoia
#

You want this one.
hf.co/bartowski/Qwen_Qwen3-4B-Instruct-2507-GGUF:Q4_K_M

You paste it into the Ollama integration and it will download it for you from huggingface. It's the new version of the model without thinking mode. I could never get qwen3:4b to actually stop its thinking. It uses about 4.5GB of VRAM and would be lighter than llama3.2.