STP a less capable "drop in replacement" for Whisper? | Home Assistant | Page 1

plucky cape May 6, 2025, 5:50 AM

#

Morning everyone, this might be an obvious one, but I wasn't able to fully confirm it via documenation(s) I found: does the speech-to-phrase service work the exact same way as e.g. Whisper when it comes to input/output, just with a limited detection range? Audio in, transcribed text back?
And related to that: when both STP and Whisper are installed in HA directly (not as a standalone container on a separate machine), do they then have different endpoint paths (?) or ports so that they can be addressed separately?

noble pond May 7, 2025, 8:21 PM

#

First off, the can both indeed exist on the same machine and be added as different Wyoming integrations.

As to the differences between how they work:

Whisper will transcribe any and all speech into text (even hallucinating speach from silence). You will get a full copy of the text sent to your conversation agent, even if it doesn't match an intent phrase.

STP will try to match speech to a given set of phrases. It will not provide arbitrary text that does not match one of your phrases.

In practice, the main difference is that STP won't be useful with an LLM because you cannot make unstructured queries since it will only return text for those that match known phrases.

plucky cape May 14, 2025, 6:19 PM

#

@noble pond , sorry for the late reaction, somehow I had missed the notification for your answer. thanks a lot so far! 🙂 But what I'm still unsure about: is speech-to-phrase a service category/type of its own from a wyoming perspective, so similar to how e.g. TTS also is a separate service type? Or does wyoming internally "see" and treat STP as "just another" STT option?

#

Context: I'm looking into locally hosting the service that would allow my pebble watch to record voice commands. Since recently it has support for wyoming whisper: https://github.com/jplexer/rebble-asr/blob/1675f845c377ce6c3e078cf80e277c775ef1f31c/asr/__init__.py#L179

GitHub

rebble-asr/asr/__init__.py at 1675f845c377ce6c3e078cf80e277c775ef1f...

asr.rebble.io: speech recognition for rebble. Contribute to jplexer/rebble-asr development by creating an account on GitHub.

#

but the configuration for the whisper host is basically just the ip and port of the wyoming server (which in my case would be my HA server).

#

so I was kind of hoping that when I only run STP on my HA server as part of wyoming, that I'd be able to use STP as a less powerful STT service

noble pond May 15, 2025, 2:43 AM

#

Ah. Yes. It’s the same thing as far as HASS is concerned. Accepts audio, returns text. It’s just a difference in how it transcribes.

#

That said, I’m not sure the Rhasspy STP will be useful for anything other than Home Assistant as the sentences it’s trained on are for Home Assistant intents.

Unless you provide a set of phrases to train against that match what your Pebble is expecting, it won’t work out.

#STP a less capable "drop in replacement" for Whisper?