#Use WakeWord / STT to Record Audio and Submit Audio to Service/Device
1 messages · Page 1 of 1 (latest)
Looking into it, I wonder if this could be accomplished via triggers and automations:
- Wake word spoken is a trigger
- Automation starts recording
- Stop word spoken is a trigger
- Automation stops recording and uploads file
But that basically would imply I could have the thing "Record until another trigger fires" and that doesn't seem feasible, looking around at like cameras and stuff doing it, it seems that recording duration is fixed.
what hardware are you trying to do this with?
there are probably ways of doing this kind of thing but you could have to build a lot of stuff yourself.
its not something basic that you can do just with a couple of automations.
I've got a RasPi I can probably use as the voice satellite and a recording device, I've got a mic and speaker for it. I've also got a beefy home server with GPU, I can use to run both HA and TTS and STT services. I do realize I'd probably have to "build the thing" I want to upload the voice data to, but I had planned on that.
Mostly, I want to use HA for the "wake word + trigger, start recording, trigger stop recording" I feel like maybe "trigger to start recording" is very clearly doable, but often the "length of recording" has to be defined for things like cameras and stuff that I've seen
If you're saying Ill need to go a step further, by making a device integration for the recorder, Im willing to do that, I've been slinging python code professionally for 15 years and a fairly decent Linux sysadmin so I know my way around a console and python.
I've actually already started playing with both openwhisper and piper-tts just figuring out how hard wake detection and then audio streaming to a backend system would be, there is some nuace around silence in the recording as well as using "sliding windows" over audio sections for detection. Someone told me about you guys and I figured I might as well see if I can bootstrap something with HA than writing the whole app from 0
home assistant voice satellites are not designed to do this sort of thing at all. you are better off making something yourself but for wake word stuff you can probably look at "OpenWakeWord" supports detecting a "wakeword" from an audio stream
so pipe mic audio stream into both that and to something which can record. when OWW triggers you can start/stop recording
Alright, well Ill doodle around
I mean I know the satlites arent built for that, I wouldn't use it for that, Id just have the satalite software running on the same hardware as another "device" I had defined that's essentially that rapberry pi running another deamon
The Satalite, as I understand it, is just a hot mic streaming to the Voice Assistant addons in HA. That, I assume, can be run as software on a raspberry pi with a microphone, probably as a daemon service. You can have the same raspberry pi run another restful service that starts and stops recording from that device and then uploads the audio somewhere (both of these things I would write). That RESTful API can be called using triggers hooked into HA via VA activations. The only hitch is I don't think the Satalite service and this other service can use the same microphone so maybe I have two and they use different onces. To be very clear: I am talking about writing custom software and building the hardware for I guess what you would call a "custom device", not like trying to leverage a canned integration with IoT devices
most commonly used satellites now are esp32 based and actually run microwakeword on the micro controller itself.
yeah, the ones that do stream and do wake word detection on the home assistant side uses openwakeword which you could use in your own project if you wanted to.
the raspi based satelites generally do use open wake word but it can be run on the rpi too
Yeah so haha, I think here are my steps:
- HA on my home server with the TTS / STT VA addons
- Build a Voice Satalite PI
- Test basic command triggering with this setup, like just make sure I can do start and stop triggers that don't do anything but generate a log on HA
- Build another bit of software to run on the same hardware as the satellite that lets me record audio until told to stop (Recording Daemon)
- Setup automatons in HA to start and stop recording audio for this new service
- Test that this start / stop works and I get full audio files.
- Build my own restful service that these audio files can be sent to
- Update the Recording Daemon to upload after receiving "stop"
- Have fun?
I realize now my focus on the "duration" of the recording was silly: I just make there be a rest API to start the recording, and then to stop and upload it, that both have triggers. If I can tie the two together with some kind of unique ID for the device, which feels very doable, this seems relatively straight forward from examples of Triggers + REST integrations in HA
Ill let you know how this goes actually, thanks for letting me rubber duck you @floral kernel I may have further questions that are actually about technical details
The ultimate goal here is to create a kind of audio journal and note taking app, the audio after uploading to the backend service would then be transcribed to text. Then later I would add additional command integrations for searching this database for relevant entries.
Transcribing the audio is simple obvoiusly, but I do want this to very much be like "Captin's Log" from star trek :D so hence the wake word / HA integration
Bonus I just get a generic Voice Satellite + HA integration I can use for other stuff.
I do realize two I may need 2 mics, if you can't have two "bits of code" reading data from the same mic at the same time
Not sure about that, but its a minor detail