Text-to-Speech API for Piper | Home Assistant | Page 1

teal epoch Jan 11, 2024, 10:59 PM

#

Is there a text-to-speech API for wyoming-piper similar to that of rhasspy's hermes?

#

Since I am running HA Supervised (docker addons), I am not sure of the setup for this.
Does this get installed on the host, and runs a separate instance of Piper? Or do I try to install this in the existing addon_core_piper container?

#

I build a hacky system with netcat:

Inside the addon_core_piper docker container, run a script for a persistent netcat listener that executes piper.
while true do netcat -lvp PORT -c '/usr/share/piper/piper --model /data/en_GB-alan-medium.onnx --output-file -' done

On Homeassistant, I have the "shell_command" integration that runs a script. That script accepts text strings as a variable, and runs echo $1 | nc ADDON_IP PORT > FILE

I have some additional script checks for input validation and stuff. Also, instead of creating a file, I can pipe the output to another netcat to a remote speaker.
I don't need to run HA satellites or media players that need to run anything big. My use case is speaker running from rpi Kodi boxes.

bold dock Jan 11, 2024, 11:09 PM

#

The Wyoming protocol itself is that API: https://github.com/rhasspy/wyoming
The add-on runs on a TCP port and will follow the TTS message flow: https://github.com/rhasspy/wyoming#text-to-speech-1

teal epoch Jan 11, 2024, 11:11 PM

#

I was curious about that service listening on 10200?
python3 -m wyoming_piper --piper /usr/share/piper/piper --uri tcp://0.0.0.0:10200

#

I have seen that accept a type:describe, but not sure if that could also accept tts data.
https://community.home-assistant.io/t/run-whisper-on-external-server/567449

#

I think I see how it works.
echo '{"type": "synthesize", "data": {"text": "Hello World"} }' | nc ADDON_IP 10200
This does work. Just need to figure out how to pipe the output and close the connection

#

I think I need the netcat to recognize the audio-stop response payload. I'm guessing that is how the other components do it

#

-w1 seems to work

#

Now to figure out how to remove the json data, and keep only the WAV file in a usable format

#

I guess I would need another tool to convert the raw PCM to WAV format.
I don't see any option in https://github.com/rhasspy/wyoming#text-to-speech-1 to choose format.
Is the Piper HTTP API you linked earlier going to be included in an updated version?

bold dock Jan 14, 2024, 2:08 PM

#

Sox can convert to WAV. Wyoming doesn't support changing the audio format. That's a client's responsibility.

#

I hadn't planned on including the HTTP API in the Piper add-on. HA already had a TTS HTTP API: https://www.home-assistant.io/integrations/tts/#rest-api

teal epoch Jan 14, 2024, 4:52 PM

#

bold dock I hadn't planned on including the HTTP API in the Piper add-on. HA already had a...

Okay thanks. I'll stick with my hacky netcat workaround. The tts rest api was suggested to me before, but it produces mp3 files which won't work for me. sox is a good idea, but didn't want to add another dependency. Especially since piper on the command line (not api) can already create the wav files that I was used to when using rhasspy.

bold dock Jan 14, 2024, 10:12 PM

#

It's pretty easy to turn the raw PCM into WAV too. You just need to know the number of samples and the sample rate/width/channels to add the header.

teal epoch Jan 15, 2024, 12:10 AM

#

The PCM data has the json messages mixed in too... so would need to also remove that then process the PCM into WAV.

#Text-to-Speech API for Piper