#/vc with openclaw

1 messages · Page 1 of 1 (latest)

strange socket
#

how to allowlist a channel for voice commands? is it possible?

slim waveBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

pale stratusBOT
#

Yes — Discord voice commands (/vc join|leave|status) are gated by the same Discord guild/channel allowlist as normal Discord group messages.

If you have channels.discord.groupPolicy: "allowlist" and you configure a per‑guild channels map, then only the channels listed there are allowed — and /vc join specifically checks the target voice channel you pick.

Example (allowlist exactly one voice channel for voice commands):

{
  channels: {
    discord: {
      commands: { native: true }, // needed for /vc
      groupPolicy: "allowlist",
      guilds: {
        "123456789012345678": {
          channels: {
            // key can be the channel ID (recommended) or the slug/name
#
            "234567890123456789": { allow: true }, // <- VOICE channel id
          },
        },
      },

      voice: { enabled: true },
    },
  },
}

Notes:

  • If the voice channel isn’t in that allowlist, you’ll get an error like “<#...> is not allowlisted for voice commands.”
  • To restrict who can run /vc, add users: [...] or roles: [...] under the guild/channel entry (same place).
  • Docs: https://docs.openclaw.ai/channels/discord (see Access control and routing + Voice channels)
strange socket
#

What's the fastest TTS setup for voice channels on Mac M4? I'm getting very slow responses....minutes in between responses. playing catch up constantly

pale stratusBOT
#

Voice channels are a pipeline: (1) transcribe audio → (2) run your chat model → (3) TTS → (4) play audio. “Minutes behind / catching up” usually means you’re building a backlog in (1) or (2) (not just TTS).

First: what openclaw --version are you on?

Fastest “known good” setup (Discord voice on a Mac)

1) Force a fast transcription provider (this is the big one).
By default, audio transcription auto-detects and will happily pick a local whisper CLI if you have it installed — that’s often painfully slow for Discord voice because it runs constantly on lots of tiny segments.

Configure tools.media.audio.models explicitly, e.g.:

{
  tools: {
    media: {
      audio: {
        enabled: true,
#
        models: [
          { provider: "openai", model: "gpt-4o-mini-transcribe" }
          // or Deepgram if you prefer: { provider: "deepgram", model: "nova-3" }
        ],
      },
    },
  },
}

Docs: https://docs.openclaw.ai/nodes/audio

2) Set a low-latency TTS provider specifically for voice channels via channels.discord.voice.tts (this overrides messages.tts for voice playback only):

#

3) Make the bot answer short in that voice channel (reduces LLM time + TTS time + playback time). You can do this per channel:

{
  channels: {
    discord: {
      guilds: {
        "YOUR_GUILD_ID": {
          channels: {
            "YOUR_VOICE_CHANNEL_ID": {
              allow: true,
              systemPrompt: "Voice mode. Reply with 1 short sentence unless asked for detail.",
            },
          },
        },
      },
#
    },
  },
}

Quick debug to pinpoint where the minutes are going

Run openclaw logs --follow while reproducing. If you paste ~30 lines around the delay (redact secrets), we can usually tell whether you’re stuck in:

  • transcription (slow/CLI auto-detect),
  • model latency (too-slow chat model for realtime),
  • or playback issues.

If you reply with:

  1. openclaw --version
  2. your channels.discord.voice + tools.media.audio config blocks (redacted)
    …I can point to the exact knob that’ll fix the backlog in your setup.
strange socket
#

What is the exact correct way to add OpenAI as a provider in openclaw.json for TTS? Specifically how do I reference the OPENAI_API_KEY from .env? Every format I try either fails with 'baseUrl required' or 'invalid input'. Can someone share a working example?

pale stratusBOT
#

For TTS you do not “add OpenAI as a model provider” (that’s models.providers.*). You configure it under messages.tts (and for Discord voice channels specifically: channels.discord.voice.tts). OpenAI TTS does not need a baseUrl.

1) Put the key in the right .env

On macOS (especially when running as a daemon), the reliable place is:

~/.openclaw/.env (not your project folder)

OPENAI_API_KEY=sk-...

Docs: https://docs.openclaw.ai/help/environment

Then restart:

#
openclaw gateway restart

2) Working TTS config (uses OPENAI_API_KEY automatically)

You can omit apiKey entirely — it falls back to OPENAI_API_KEY from env.

Global auto‑TTS for replies:

{
  messages: {
    tts: {
      auto: "always",
      provider: "openai",
      openai: {
        model: "gpt-4o-mini-tts",
#
        voice: "alloy",
      },
    },
  },
}

Discord voice channels (what /vc uses):

{
  channels: {
    discord: {
      voice: {
        enabled: true,
        tts: {
#
{
  messages: {
    tts: {
      provider: "openai",
      auto: "always",
      openai: {
        apiKey: "${OPENAI_API_KEY}",
        model: "gpt-4o-mini-tts",
        voice: "alloy",
      },
    },
  },
}

If you paste the exact snippet that’s giving you “baseUrl required”, I can tell you which section you accidentally put it in (that error almost always comes from configuring a custom OpenAI-compatible provider under models.providers, not from TTS).

strange socket
#

got it! but why is the wait between my talking and openclaw responding so long? i asked what time it was and it took 1 min 4 seconds to answer?

pale stratusBOT
#

That delay is basically the sum of the whole voice pipeline:

  1. Wait for you to stop talking (OpenClaw captures until it sees ~1s of silence)
  2. Decode Opus → WAV
  3. Transcribe (STT)
  4. Run the chat model
  5. TTS
  6. Play it back (and if anything is already “in flight”, it queues)

A “normal” round-trip for “what time is it?” should feel like a few seconds. ~60s usually means one of these is happening:

A) It’s not detecting “silence” quickly (open mic / background noise)

If the mic is hot and there’s constant noise, the “end after silence” trigger can take ages, so transcription doesn’t even start yet.

Fixes:

  • Try push-to-talk (best quick test)
  • Increase Discord input sensitivity / enable noise suppression
#
  • Speak, then pause cleanly (no trailing words/noise)

B) STT is using a slow local CLI (most common)

Audio transcription auto-detects and will prefer local whisper/whisper-cpp if present — which can be way too slow for realtime Discord voice.

Force a fast provider explicitly (so it doesn’t auto-pick a CLI):

{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          { provider: "openai", model: "gpt-4o-mini-transcribe" }
          // or: { provider: "deepgram", model: "nova-3" }
#
        ],
      },
    },
  },
}

Docs: https://docs.openclaw.ai/nodes/audio

C) Backlog/queuing (you spoke again, or multiple people are talking)

Voice processing + playback are serialized per voice session, so if segments pile up, it “plays catch up”.

Fix: wait for it to finish speaking before talking again (and keep replies short).

Fast way to pinpoint which step is costing 60 seconds

Run this while reproducing:

#
openclaw logs --follow

Look for discord voice: lines like:

  • capture ready (...)
  • transcription ok (...)
  • reply ok (...)
  • tts ok (...)
  • playback start / playback done

The timestamps between those tell you exactly where the minute is going.

If you paste a small redacted log chunk from one slow turn (just the discord voice: lines), I can tell you whether it’s silence detection, STT, model, or playback queue.