#can you use/reflash VPE as a better (than M5stack AtomS3R) openai voice assistant?

1 messages · Page 1 of 1 (latest)

sick halo
#

hello,

I landed here trying to build an open ai voice assistant for interactive storytelling for my daughter (there doesn't seem to be one ready to go...).

While home automation is interesting, it's not my main focus right now, but the home assistant voice PE seems to be a really nice box with the right hardware compared to many of the other options out there and saving me some work from a full build-up.

Can somebody help me understand how hackable it is to be turned into a standalone openai voicebot vs that just being a feature among many others?

Espressif has a nice port of the openai embedded sdk and provisions for webrtc connections, which would be ideal to leverage gpt-realtime models https://github.com/espressif/esp-webrtc-solution/tree/main/solutions/openai_demo . This is also what's used on the M5Stack AtomS3R Ai chatbot https://shop.m5stack.com/blogs/news/diy-a-chatgpt-voice-assistant-using-esp32-based-hardware-atoms3r-and-echo-base

However I assume I can't just flash that to the voice assistant, which seems to be based on esphome project (it's not clear to me how esphome related to home-assistant.io and how home assistant relates to VPE btw)

Can anybody help me understand if I can reuse this hardware or not and what would that entail?

otherwise something like this seems more aligned to the project, https://www.seeedstudio.com/ReSpeaker-Lite-Voice-Assistant-Kit-Full-Kit-of-2-Mic-Array-pre-soldered-XIAO-ESP32S3-Mono-Enclosed-Speaker-and-Enclosure.html , but the XMOS choice in the voice assistant seems superior audio wise, plus it has a jack ready to go for some speakers I have and more options for interactivity.

thank you,

GitHub

Contribute to espressif/esp-webrtc-solution development by creating an account on GitHub.

m5stack-store

A tutorial about how to build your own ChatGPT voice assistant using OpenAI API and ESP32-based hardware (M5Stack AtomS3R and Atomic Echo Base).

lapis meteor
#

Both Respeaker Lite and VPE use same XMOS chip XU316.
There's ESPHome integration for Respeaker Lite with feature parity to VPE.

However, without HA server itself, I doubt you can easily make that working with OpenAI. There's a lot of under-the-hood like STT/TTS happening. So you need some server anyways.

sick halo
# lapis meteor Both Respeaker Lite and VPE use same XMOS chip XU316. There's ESPHome integratio...

thanks a lot @lapis meteor , I plan on using the new speech-to-speech model, I implement this stuff for a living in "normal software", I just have no idea about embedded hw, I only did some pi stuff a few years back and that's almost a normal computer. The link I shared from GH and espressif should actually implement the whole thing as I expect it to be done (in principle I mean), what I don't understand is the management of the XMOS chip and what's needed on the ESP side for that to all work together

#

in their repo they say that code is designed for the esp-korvo, which seems to have different audio chipset, so my guess is that if I tried to flash that code on the VPE or the respeaker, things wouldn't work because that code is not designed to work on the XMOS, which as far as I read is a very powerful but also low level audio chip and requires quite a bit of legwork to implement stuff on it.

#

I also found this project, https://github.com/akdeb/ElatoAI, which implements the same thing again on hw with a diff chip and uses a remote server for the openAI connection as you seem to allude to with the HA server. I don't believe that's necessary judging from the openai embedded sdk, but again I don't know what I don't know and there may be limitations on the ESP side I'm not understanding atm, hence asking around

lapis meteor
# sick halo in their repo they say that code is designed for the esp-korvo, which seems to h...

XMOS XU316 on Respeaker has two versions of DFU software. They're freely accessible https://github.com/respeaker/ReSpeaker_Lite/tree/master, and also there are examples of usage. One is for usage over USB (as USB mic/speaker pair), another is for I2C communication, with open source docs. I believe you would need latter.

Then, ESP32 can use ESPHome, or pure ESP-IDF. On former it's pretty hard to implement custom external API, since the API it has is built to work with HA. So you will need to use ESP-IDF or Arduino (less possibly). Prepare to have long journey there - not only C++ is pretty hard for newcomers - but ESP-IDF is pretty extensive framework as well. And heavy audio tasks require too much orchestration to be tacked by new dev. The firmware, that was built for diffrent chip, definitely won't work on PE/Respeaker.

sick halo
#

thanks @lapis meteor , that helps and it's aligned with what I imagined, which is a bummer as I'm not sure what's the best way to proceed at this point, I've reached out to folks from Seed and on the espressif openai repo to see if they can confirm compatibility with other boards besides the korvo-2 and if they do then I guess that's my best course of action and I hopefully have all the core stuff dealt with and I can just focus on openai code.

lapis meteor
random musk
sick halo
#

hey @lapis meteor , picking up on this old thread again now that I have a little more experience with the respeaker lite, any chance you have insights on flashing the XMOS as opposed the ESP32? it's not clear to me with the single USB-C port how I'd flash both, any hints?

#

I managed to get the respeaker lite working, but I'm annoyed/put off by the lack of source code. XMOS has proper libraries and code available, but I don't know what else is going on on that board to write my own firmware for it. I'd assume with the HA VPE someone wrote the firmware for the XMOS/audio part of the board and this should be doable, but I can't find docs about it

#

thanks

lapis meteor
#

There's 2 USB ports on Respeaker Lite Voice Kit - one on ESP chip itself, another on opposite side.

sick halo
#

yeah I wanna write my own DFU, becuase I see that XMOS can do a lot more than just passing audio and I wanna offload as much as possible to it so that I have more resources on the ESP32-S3

lapis meteor
#

For DFU flashing you use that one, close to JST connector

sick halo
#

I don't know that I'm capable of it, but I'd like to try/see what it would take

sick halo
#

I only see one port on the VPE

lapis meteor
sick halo
#

the VPE looks like basically the same hw, XU316 + esp32-s3

lapis meteor
sick halo
#

ah

#

that would explain

#

didn't even know that was a thing...

lapis meteor
#

So basically you can replace the firmware with yours in YAML, and it would update. But it's unusable for debugging.

lapis meteor
sick halo
#

that's the other big topic at this point, I've no idea how to do what I normally do with software... having a debugger, step in functions, etc, right now most of my development is 1970-style... write stuff, launch it and hope it works...

#

makes me cringe, need to figure out a better workflow, any recommendations?

lapis meteor
#

I guess there should be JTAG connector somewhere on the board, so you could connect to XMOS directly

#

But don't ask me where or how 🙂

lapis meteor
#

Lately i use LLM for that, and it goes better. 🙂

sick halo
#

heh, doing the same, I managed to get this far just thanks to claude or I'd given up already

#

I got the wakeword to work and record/send streams out to apis

#

I need to find out how to train my own wakeword becasue the default ones aren't working so well for me

#

even if I use chatgpt american english speaker, "hi esp" fails most of the times