#What is the best speech-to-text model for German these days?
1 messages · Page 1 of 1 (latest)
What is the best speech-to-text model for German these days?
So I just found out that you can use ollama as conversation agent and let the LLM correct misspelled words. This works quite realiably.
yeah llm "correction" based on text works pretty well depending on the model etc...
in my experience it smoothens things out nicely. but my experience is basically only in english.
Whisper in English is fine. In German it‘s pretty bad, but since there is nothing else available, I will have to live with it. The LLM is able to correct even the worst garbage input it seems. Would be nice if something better came along. The main wyoming dev is German, but I guess training a speech model is out of his capacities as well.
its about available training data too.
Well it has already been done for gpt4. OpenAI just doesn’t release their cash cow.
Sadly Germany is not great at AI stuff. We don’t even have one original model. France at least has Mistral. Their situation for piper is even worse than the German one though
But apart from all the lamenting, ollama as conversation agent is pretty amazing. You can even tell it stuff like „a little brighter“ and „go back to the previous color“ etc
yeah, I use qwen3 with ollama myself.
What hardware are you running your ollama on, and which exact model are you guys using? @wide swan & @cold granite
same it’s peak
I used to use ollama, but llama-server is so much faster
ooh i didnt know that
wait what makes it “faster”? shouldnt it be fully dependant on the model and hardware
-# mostly
Its running on a 5060ti (16gb) on one of the servers in my rack 🙂
No idea, but I get 10-25 TPS higher on llama servee
llama-server --host 0.0.0.0 --port 10600 -fa 1 -ngl 99 --ctx-size 16384 --jinja --cache-ram -1 --threads -1 --temp 0.7 --top-p 0.95 --min-p 0.01 --repeat-penalty 1.0 -b 4096 -ub 4096 -hf unsloth/Qwen3-4B-Instruct-2507-GGUF:Q8_K_XL
I use it with https://github.com/mike-nott/mcp-assist but that's not related to this issue
I am actually using gpt-oss:120b in the Ollama cloud. Not a hundred percent private then, but it‘s for free, very responsive and understands complex commands. You can even tell it to correct words that whisper regularly misunderstands.
Although I think even a 1B model would be sufficient for English, which you could run on an Intel A310. They struggle at other languages though.
I have my own home server in a microATX case