Hello. If you have some time and basic Python knowledge (not mandatory, you can ask an LLM for help), I suggest testing the new streaming models from Kroko. They claim impressive metrics, but practical testing is needed. I can’t evaluate all languages myself due to the language barrier.
The following languages are supported: [English, Dutch, French, Portuguese, Spanish, German, Italian, Swedish, Hebrew, Turkish].
Server: https://github.com/mitrokun/wyoming_streaming_asr/
Obtaining the models is somewhat complicated by the creators; I had to create an encoder. You need to download the model from HF, place it in the models/kroko directory, and run the script. There’s a detailed instruction in the directory. 64 and 128 refer to the window size in ms; you can try any model.
The server has absolutely no hardware requirements; almost any CPU will do.
To start, specify the path to the directory, the language, and the --debug flag to conveniently evaluate the ASR performance. The default port is 10303.
The "command" option is a separate mechanism for enhancing user experience; leave it for later.
I would appreciate your feedback. If it is really as good as it claims to be, then we can make an add-on.