Hey! I'm building a local voice assistant tying together faster_whisper (STT), OpenClaw (LLM), and piper (TTS) via a Python script.
The Current State (Working but Slow):
I have a 100% functional pipeline. Currently, my Python script sends the user's transcribed audio to OpenClaw using the CLI:
subprocess.run(['openclaw', 'agent', '--agent', 'main', '--message', text, '--json'])
The Problem:
While it works perfectly, the latency is around 15 seconds per turn. Profiling shows that the CLI triggers an embedded runner (a cold boot loading all context) rather than using my already hot Gateway. The LLM generation alone takes ~11s, and we have to wait for the entire JSON payload to finish before passing it to the TTS engine.
What we tried:
To achieve real-time streaming (so the TTS can speak while the LLM generates), we tried connecting directly to the Gateway WebSocket (ws://127.0.0.1:18789). However, we couldn't complete the connect.challenge / nonce security handshake. We also tried openclaw proxy to intercept the TUI auth flow, but couldn't easily extract the payload.
My Questions:
What is the recommended way to interact with a hot OpenClaw gateway via a local Python script with streaming support?
Is there a documented WebSocket auth flow for external scripts, a local REST API, or a specific CLI flag that forces the agent command to use the hot gateway (and stream) instead of the embedded runner?