#Building voice pipeline for openclaw

1 messages · Page 1 of 1 (latest)

jolly arch
#

Hi Krill — planning to test the OpenClaw voice pipeline using a minimal single HTML file in Chrome (no external dependencies, no npm). The idea is to use browser-native APIs only:

  • webkitSpeechRecognition for STT
  • fetch to hit http://127.0.0.1:18789/v1/chat/completions
  • SpeechSynthesis for TTS

Two questions before we build:

  1. CORS — does the OpenClaw gateway at port 18789 send permissive CORS headers for http://localhost origins by default? Or does it need explicit config to allow cross-origin fetch from a browser?

  2. chatCompletions endpoint — we need to enable it via:

{
  "gateway": {
    "http": {
      "endpoints": {
        "chatCompletions": { "enabled": true }
      }
    }
  }
}

Is this the correct field path for v2026.4.26? And does it require a full gateway restart or just a doctor?

proven aspenBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

worldly umbraBOT
#

I checked current docs/source: your config path is correct, but browser CORS is the catch.

  1. CORS: I do not see permissive CORS headers for /v1/chat/completions. A browser fetch() from file://, http://localhost:xxxx, etc. will likely fail unless the request is same-origin or you add/proxy CORS yourself. gateway.controlUi.allowedOrigins is an origin/auth allowlist for Gateway browser clients — it is not the same as emitting Access-Control-Allow-Origin.

  2. Endpoint config: yes, for v2026.4.26 this is the right path:

{
  gateway: {
    http: {
      endpoints: {
        chatCompletions: { enabled: true },
      },
    },
  },
}
#

Docs: https://docs.openclaw.ai/gateway/openai-http-api

  1. Restart vs doctor: this is under gateway.*, so treat it as gateway restart required. doctor can validate/repair, but it does not by itself make the running gateway serve the new endpoint. With default gateway.reload.mode: "hybrid", a config write may auto-schedule the restart; otherwise do:
openclaw gateway restart

Also: the endpoint uses Gateway auth, so local browser tests will usually need Authorization: Bearer <gateway-token> unless gateway.auth.mode: "none" is explicitly set. Don’t expose that token in anything non-local. 🦐

jolly arch
#

Quick follow-up — is there a gateway.cors or similar config in v2026.4.26 that would allow whitelisting specific origins for /v1/chat/completions? Or is a proxy the only path for browser-based clients hitting the gateway from a different port?

worldly umbraBOT
#

Yep — I checked the v2026.4.26 schema/source shape: there’s no gateway.cors / per-endpoint CORS config for /v1/chat/completions.

Important distinction:

  • gateway.controlUi.allowedOrigins = browser-origin allowlist used by Gateway auth/origin checks.
  • It does not emit Access-Control-Allow-Origin / handle CORS preflight for your HTML page.

So for a normal Chrome page on a different port, e.g. localhost:5173 → 127.0.0.1:18789, a same-origin proxy/shim is the clean path today.

Best setup for the demo:

Browser HTML -> http://127.0.0.1:<your-proxy-port>/v1/chat/completions
Proxy        -> http://127.0.0.1:18789/v1/chat/completions

Bonus: the proxy can inject the Gateway bearer token server-side, so you don’t expose the operator token in frontend JS.

jolly arch
#

testing native TTS on v2026.4.26 running in Docker on Windows 11. Here's what we found and what we're planning:
Setup:

OpenClaw v2026.4.26 in Docker container on Windows 11
Microsoft speech plugin enabled in openclaw.json
/status shows: Voice: inbound · provider=microsoft · voice=en-US-RogerNeural · limit=1500 · summary=on

Test:

Sent /tts on via Telegram
Response: TTS enabled. ⚠️ Media failed.

Gateway log:
Failed sending web media to <recipient>: ffmpeg not found in trusted system directories.
Install it via your system package manager (e.g. apt install ffmpeg / dnf install ffmpeg).
Media skipped; sent text-only to <recipient>
Our diagnosis: TTS pipeline works end to end — it generated audio with en-US-RogerNeural. Delivery to the channel failed because ffmpeg is missing from the container. Fix is one line in the Dockerfile:
dockerfileRUN apt-get update && apt-get install -y ffmpeg --no-install-recommends
Questions before we proceed:

Is our diagnosis correct — is ffmpeg the only missing piece here?
Is apt-get install ffmpeg the right approach for the OpenClaw Docker image, or is there a preferred way to add it (e.g. a bundled dependency, a config flag, or a different base image)?
Any risk to adding ffmpeg to the Dockerfile — will it conflict with anything in the OpenClaw image?
After adding ffmpeg and rebuilding, is there anything else needed to make TTS delivery work on Telegram/WhatsApp, or should it just work?

worldly umbraBOT
#

Yes — your diagnosis looks right.

That log is from media delivery, not speech synthesis. OpenClaw got far enough to try sending generated TTS media, then failed because ffmpeg was not found in trusted system dirs.

A few specifics:

  1. Is ffmpeg the missing piece?
    For this error, yes. Install the Debian ffmpeg package — it includes both ffmpeg and ffprobe, which other audio/media paths may also use.

  2. Preferred Docker approach:
    Don’t patch the Dockerfile unless you need to. The Docker setup already supports this:

    OPENCLAW_DOCKER_APT_PACKAGES=ffmpeg ./scripts/docker/setup.sh
    

    Or manually:

#
   docker build \
     --build-arg OPENCLAW_DOCKER_APT_PACKAGES=ffmpeg \
     -t openclaw:local \
     -f Dockerfile .

Docs: https://docs.openclaw.ai/install/docker

  1. Risk/conflict?
    Low. The runtime image is Debian bookworm-slim, and OpenClaw intentionally resolves ffmpeg/ffprobe from trusted system paths like /usr/bin. Installing the distro package is exactly the safe shape. Main downside is image size.

  2. After rebuild:
    Recreate/restart the container, then sanity-check:

#
   docker compose exec openclaw-gateway ffmpeg -version
   docker compose exec openclaw-gateway ffprobe -version

Then try:

/tts audio hello from OpenClaw

For WhatsApp, this is especially needed because Microsoft TTS may output MP3/WebM and WhatsApp voice-note delivery transcodes non-Ogg audio to Ogg/Opus with ffmpeg. For Telegram, it should also unblock media delivery unless there’s a separate Telegram-side issue.

TTS docs: https://docs.openclaw.ai/tools/tts

jolly arch
#

Hi Krill — on v2026.4.26 with messages.tts.auto: "inbound" configured, I'm seeing audio replies even when sending plain text messages (not voice notes). My understanding was that inbound mode should match modality — text in returns text only, voice in returns voice. Is this the expected behavior, or is /tts on session command overriding the inbound config and behaving like always? What's the actual precedence between the session /tts on command and the messages.tts.auto config value?

worldly umbraBOT
#

Yep — your mental model of inbound is correct, but /tts on changes the effective mode.

For v2026.4.26, precedence for the auto mode is:

  1. Chat/session override from /tts chat on|off
  2. Local TTS prefs from /tts on / /tts off
  3. Effective config: messages.tts → agent/channel/account overrides

So:

  • messages.tts.auto: "inbound" alone = only TTS after inbound voice/audio
  • /tts on = writes local prefs auto: "always" and overrides config
  • /tts chat on = per-chat/session always, even stronger
  • /tts chat default clears only the chat override, not the local /tts on pref

To verify, run:

#
/tts status
/tts chat status

If /tts status shows auto: always, that explains text messages getting audio.

To get back to config-driven inbound, clear the local TTS pref in the TTS prefs file — default is:

~/.openclaw/settings/tts.json

In Docker that’s usually under: