Hi Krill — planning to test the OpenClaw voice pipeline using a minimal single HTML file in Chrome (no external dependencies, no npm). The idea is to use browser-native APIs only:
webkitSpeechRecognitionfor STTfetchto hithttp://127.0.0.1:18789/v1/chat/completionsSpeechSynthesisfor TTSTwo questions before we build:
CORS — does the OpenClaw gateway at port 18789 send permissive CORS headers for
http://localhostorigins by default? Or does it need explicit config to allow cross-origin fetch from a browser?chatCompletions endpoint — we need to enable it via:
{ "gateway": { "http": { "endpoints": { "chatCompletions": { "enabled": true } } } } }Is this the correct field path for v2026.4.26? And does it require a full gateway restart or just a doctor?
#Building voice pipeline for openclaw
1 messages · Page 1 of 1 (latest)
I checked current docs/source: your config path is correct, but browser CORS is the catch.
-
CORS: I do not see permissive CORS headers for
/v1/chat/completions. A browserfetch()fromfile://,http://localhost:xxxx, etc. will likely fail unless the request is same-origin or you add/proxy CORS yourself.gateway.controlUi.allowedOriginsis an origin/auth allowlist for Gateway browser clients — it is not the same as emittingAccess-Control-Allow-Origin. -
Endpoint config: yes, for v2026.4.26 this is the right path:
{
gateway: {
http: {
endpoints: {
chatCompletions: { enabled: true },
},
},
},
}
Docs: https://docs.openclaw.ai/gateway/openai-http-api
- Restart vs doctor: this is under
gateway.*, so treat it as gateway restart required.doctorcan validate/repair, but it does not by itself make the running gateway serve the new endpoint. With defaultgateway.reload.mode: "hybrid", a config write may auto-schedule the restart; otherwise do:
openclaw gateway restart
Also: the endpoint uses Gateway auth, so local browser tests will usually need Authorization: Bearer <gateway-token> unless gateway.auth.mode: "none" is explicitly set. Don’t expose that token in anything non-local. 🦐
Quick follow-up — is there a gateway.cors or similar config in v2026.4.26 that would allow whitelisting specific origins for /v1/chat/completions? Or is a proxy the only path for browser-based clients hitting the gateway from a different port?
Yep — I checked the v2026.4.26 schema/source shape: there’s no gateway.cors / per-endpoint CORS config for /v1/chat/completions.
Important distinction:
gateway.controlUi.allowedOrigins= browser-origin allowlist used by Gateway auth/origin checks.- It does not emit
Access-Control-Allow-Origin/ handle CORS preflight for your HTML page.
So for a normal Chrome page on a different port, e.g. localhost:5173 → 127.0.0.1:18789, a same-origin proxy/shim is the clean path today.
Best setup for the demo:
Browser HTML -> http://127.0.0.1:<your-proxy-port>/v1/chat/completions
Proxy -> http://127.0.0.1:18789/v1/chat/completions
Bonus: the proxy can inject the Gateway bearer token server-side, so you don’t expose the operator token in frontend JS.
Docs for the endpoint: https://docs.openclaw.ai/gateway/openai-http-api
testing native TTS on v2026.4.26 running in Docker on Windows 11. Here's what we found and what we're planning:
Setup:
OpenClaw v2026.4.26 in Docker container on Windows 11
Microsoft speech plugin enabled in openclaw.json
/status shows: Voice: inbound · provider=microsoft · voice=en-US-RogerNeural · limit=1500 · summary=on
Test:
Sent /tts on via Telegram
Response: TTS enabled. ⚠️ Media failed.
Gateway log:
Failed sending web media to <recipient>: ffmpeg not found in trusted system directories.
Install it via your system package manager (e.g. apt install ffmpeg / dnf install ffmpeg).
Media skipped; sent text-only to <recipient>
Our diagnosis: TTS pipeline works end to end — it generated audio with en-US-RogerNeural. Delivery to the channel failed because ffmpeg is missing from the container. Fix is one line in the Dockerfile:
dockerfileRUN apt-get update && apt-get install -y ffmpeg --no-install-recommends
Questions before we proceed:
Is our diagnosis correct — is ffmpeg the only missing piece here?
Is apt-get install ffmpeg the right approach for the OpenClaw Docker image, or is there a preferred way to add it (e.g. a bundled dependency, a config flag, or a different base image)?
Any risk to adding ffmpeg to the Dockerfile — will it conflict with anything in the OpenClaw image?
After adding ffmpeg and rebuilding, is there anything else needed to make TTS delivery work on Telegram/WhatsApp, or should it just work?
Yes — your diagnosis looks right.
That log is from media delivery, not speech synthesis. OpenClaw got far enough to try sending generated TTS media, then failed because ffmpeg was not found in trusted system dirs.
A few specifics:
-
Is
ffmpegthe missing piece?
For this error, yes. Install the Debianffmpegpackage — it includes bothffmpegandffprobe, which other audio/media paths may also use. -
Preferred Docker approach:
Don’t patch the Dockerfile unless you need to. The Docker setup already supports this:OPENCLAW_DOCKER_APT_PACKAGES=ffmpeg ./scripts/docker/setup.shOr manually:
docker build \
--build-arg OPENCLAW_DOCKER_APT_PACKAGES=ffmpeg \
-t openclaw:local \
-f Dockerfile .
Docs: https://docs.openclaw.ai/install/docker
-
Risk/conflict?
Low. The runtime image is Debian bookworm-slim, and OpenClaw intentionally resolvesffmpeg/ffprobefrom trusted system paths like/usr/bin. Installing the distro package is exactly the safe shape. Main downside is image size. -
After rebuild:
Recreate/restart the container, then sanity-check:
docker compose exec openclaw-gateway ffmpeg -version
docker compose exec openclaw-gateway ffprobe -version
Then try:
/tts audio hello from OpenClaw
For WhatsApp, this is especially needed because Microsoft TTS may output MP3/WebM and WhatsApp voice-note delivery transcodes non-Ogg audio to Ogg/Opus with ffmpeg. For Telegram, it should also unblock media delivery unless there’s a separate Telegram-side issue.
TTS docs: https://docs.openclaw.ai/tools/tts
Hi Krill — on v2026.4.26 with messages.tts.auto: "inbound" configured, I'm seeing audio replies even when sending plain text messages (not voice notes). My understanding was that inbound mode should match modality — text in returns text only, voice in returns voice. Is this the expected behavior, or is /tts on session command overriding the inbound config and behaving like always? What's the actual precedence between the session /tts on command and the messages.tts.auto config value?
Yep — your mental model of inbound is correct, but /tts on changes the effective mode.
For v2026.4.26, precedence for the auto mode is:
- Chat/session override from
/tts chat on|off - Local TTS prefs from
/tts on//tts off - Effective config:
messages.tts→ agent/channel/account overrides
So:
messages.tts.auto: "inbound"alone = only TTS after inbound voice/audio/tts on= writes local prefsauto: "always"and overrides config/tts chat on= per-chat/sessionalways, even stronger/tts chat defaultclears only the chat override, not the local/tts onpref
To verify, run:
/tts status
/tts chat status
If /tts status shows auto: always, that explains text messages getting audio.
To get back to config-driven inbound, clear the local TTS pref in the TTS prefs file — default is:
~/.openclaw/settings/tts.json
In Docker that’s usually under: