#esp32
1 messages · Page 1 of 1 (latest)
OpenAI’s docs include an example project called ElatoAI that shows how to build an ESP32‑based realtime voice agent using the OpenAI Realtime API. It specifically targets the ESP32‑S3 with the Arduino framework and PlatformIO. [ElatoAI overview]
Key points about ESP32 usage in this example:
- Hardware: ESP32‑S3 dev board, I2S mic, I2S amp + speaker, button, RGB LED (no PSRAM required). [ElatoAI overview]
- Firmware stack: Arduino framework via PlatformIO with libraries like ArduinoJson, WebSockets, AsyncWebServer, ESP32_Button, Arduino Audio Tools, and ArduinoLibOpus. [PlatformIO config; Tech stack]
- Architecture:
- ESP32 client streams audio over secure WebSockets
- Edge server (Deno / Supabase Edge) connects to OpenAI Realtime API
- Next.js frontend manages agents, devices, and conversations. [Project architecture]
- Features: Realtime speech‑to‑speech, device management, OTA updates, Wi‑Fi captive portal, button/touch control, and volume control from the web app. [Key features]
- Performance / limits: <2s global round‑trip latency, Opus at ~12 kbps, up to about 10‑minute continuous conversations, with some cold‑start and session‑length limitations. [Important stats; Limitations]
For step‑by‑step setup (Supabase, Next.js frontend, Deno edge server, and ESP32 firmware), see the full ElatoAI guide and linked GitHub repo. [Quick start]
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@signal basalt Kapa.ai is still learning and improving, please let me know how I did by reacting below.