Most of my personal issues (and those reviews online) using voice assistants is the required delay between the wake word and the user command.
Why couldn’t the voice satellite hardware always capture the past X seconds of audio (locally of course) so that when the wake word is detected then all the audio after wake word is sent to HA?