/vc with openclaw | Friends of the Crustacean 🦞🤝 | Page 1

strange socket Feb 25, 2026, 6:59 PM

#

how to allowlist a channel for voice commands? is it possible?

slim waveBOT Feb 25, 2026, 6:59 PM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

pale stratusBOT Feb 25, 2026, 7:01 PM

#

Yes — Discord voice commands (/vc join|leave|status) are gated by the same Discord guild/channel allowlist as normal Discord group messages.

If you have channels.discord.groupPolicy: "allowlist" and you configure a per‑guild channels map, then only the channels listed there are allowed — and /vc join specifically checks the target voice channel you pick.

Example (allowlist exactly one voice channel for voice commands):

{
  channels: {
    discord: {
      commands: { native: true }, // needed for /vc
      groupPolicy: "allowlist",
      guilds: {
        "123456789012345678": {
          channels: {
            // key can be the channel ID (recommended) or the slug/name

#

            "234567890123456789": { allow: true }, // <- VOICE channel id
          },
        },
      },

      voice: { enabled: true },
    },
  },
}

Notes:

If the voice channel isn’t in that allowlist, you’ll get an error like “<#...> is not allowlisted for voice commands.”
To restrict who can run /vc, add users: [...] or roles: [...] under the guild/channel entry (same place).
Docs: https://docs.openclaw.ai/channels/discord (see Access control and routing + Voice channels)

strange socket Feb 25, 2026, 7:20 PM

#

What's the fastest TTS setup for voice channels on Mac M4? I'm getting very slow responses....minutes in between responses. playing catch up constantly

pale stratusBOT Feb 25, 2026, 7:23 PM

#

Voice channels are a pipeline: (1) transcribe audio → (2) run your chat model → (3) TTS → (4) play audio. “Minutes behind / catching up” usually means you’re building a backlog in (1) or (2) (not just TTS).

First: what openclaw --version are you on?

Fastest “known good” setup (Discord voice on a Mac)

1) Force a fast transcription provider (this is the big one).
By default, audio transcription auto-detects and will happily pick a local whisper CLI if you have it installed — that’s often painfully slow for Discord voice because it runs constantly on lots of tiny segments.

Configure tools.media.audio.models explicitly, e.g.:

{
  tools: {
    media: {
      audio: {
        enabled: true,

#

        models: [
          { provider: "openai", model: "gpt-4o-mini-transcribe" }
          // or Deepgram if you prefer: { provider: "deepgram", model: "nova-3" }
        ],
      },
    },
  },
}

Docs: https://docs.openclaw.ai/nodes/audio

2) Set a low-latency TTS provider specifically for voice channels via channels.discord.voice.tts (this overrides messages.tts for voice playback only):

#

{
  channels: {
    discord: {
      voice: {
        enabled: true,
        tts: {
          provider: "openai",
          openai: { model: "gpt-4o-mini-tts", voice: "alloy" },
        },
      },
    },
  },
}

Docs: https://docs.openclaw.ai/channels/discord and https://docs.openclaw.ai/tts

#

3) Make the bot answer short in that voice channel (reduces LLM time + TTS time + playback time). You can do this per channel:

{
  channels: {
    discord: {
      guilds: {
        "YOUR_GUILD_ID": {
          channels: {
            "YOUR_VOICE_CHANNEL_ID": {
              allow: true,
              systemPrompt: "Voice mode. Reply with 1 short sentence unless asked for detail.",
            },
          },
        },
      },

#

    },
  },
}

Quick debug to pinpoint where the minutes are going

Run openclaw logs --follow while reproducing. If you paste ~30 lines around the delay (redact secrets), we can usually tell whether you’re stuck in:

transcription (slow/CLI auto-detect),
model latency (too-slow chat model for realtime),
or playback issues.

If you reply with:

openclaw --version
your channels.discord.voice + tools.media.audio config blocks (redacted)
…I can point to the exact knob that’ll fix the backlog in your setup.

strange socket Feb 25, 2026, 10:28 PM

#

What is the exact correct way to add OpenAI as a provider in openclaw.json for TTS? Specifically how do I reference the OPENAI_API_KEY from .env? Every format I try either fails with 'baseUrl required' or 'invalid input'. Can someone share a working example?

pale stratusBOT Feb 25, 2026, 10:31 PM

#

For TTS you do not “add OpenAI as a model provider” (that’s models.providers.*). You configure it under messages.tts (and for Discord voice channels specifically: channels.discord.voice.tts). OpenAI TTS does not need a baseUrl.

1) Put the key in the right `.env`

On macOS (especially when running as a daemon), the reliable place is:

~/.openclaw/.env (not your project folder)

OPENAI_API_KEY=sk-...

Docs: https://docs.openclaw.ai/help/environment

Then restart:

#

openclaw gateway restart

2) Working TTS config (uses `OPENAI_API_KEY` automatically)

You can omit apiKey entirely — it falls back to OPENAI_API_KEY from env.

Global auto‑TTS for replies:

{
  messages: {
    tts: {
      auto: "always",
      provider: "openai",
      openai: {
        model: "gpt-4o-mini-tts",

#

        voice: "alloy",
      },
    },
  },
}

Discord voice channels (what /vc uses):

{
  channels: {
    discord: {
      voice: {
        enabled: true,
        tts: {

#

          provider: "openai",
          openai: { voice: "alloy", model: "gpt-4o-mini-tts" },
        },
      },
    },
  },
}

Docs: https://docs.openclaw.ai/tts and https://docs.openclaw.ai/channels/discord

3) If you really want to reference the env var inside JSON5

Env-substitution works via ${VAR}:

#

{
  messages: {
    tts: {
      provider: "openai",
      auto: "always",
      openai: {
        apiKey: "${OPENAI_API_KEY}",
        model: "gpt-4o-mini-tts",
        voice: "alloy",
      },
    },
  },
}

If you paste the exact snippet that’s giving you “baseUrl required”, I can tell you which section you accidentally put it in (that error almost always comes from configuring a custom OpenAI-compatible provider under models.providers, not from TTS).

strange socket Feb 25, 2026, 10:42 PM

#

got it! but why is the wait between my talking and openclaw responding so long? i asked what time it was and it took 1 min 4 seconds to answer?

pale stratusBOT Feb 25, 2026, 10:44 PM

#

That delay is basically the sum of the whole voice pipeline:

Wait for you to stop talking (OpenClaw captures until it sees ~1s of silence)
Decode Opus → WAV
Transcribe (STT)
Run the chat model
TTS
Play it back (and if anything is already “in flight”, it queues)

A “normal” round-trip for “what time is it?” should feel like a few seconds. ~60s usually means one of these is happening:

A) It’s not detecting “silence” quickly (open mic / background noise)

If the mic is hot and there’s constant noise, the “end after silence” trigger can take ages, so transcription doesn’t even start yet.

Fixes:

Try push-to-talk (best quick test)
Increase Discord input sensitivity / enable noise suppression

#

Speak, then pause cleanly (no trailing words/noise)

B) STT is using a slow local CLI (most common)

Audio transcription auto-detects and will prefer local whisper/whisper-cpp if present — which can be way too slow for realtime Discord voice.

Force a fast provider explicitly (so it doesn’t auto-pick a CLI):

{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          { provider: "openai", model: "gpt-4o-mini-transcribe" }
          // or: { provider: "deepgram", model: "nova-3" }

#

        ],
      },
    },
  },
}

Docs: https://docs.openclaw.ai/nodes/audio

C) Backlog/queuing (you spoke again, or multiple people are talking)

Voice processing + playback are serialized per voice session, so if segments pile up, it “plays catch up”.

Fix: wait for it to finish speaking before talking again (and keep replies short).

Fast way to pinpoint which step is costing 60 seconds

Run this while reproducing:

#

openclaw logs --follow

Look for discord voice: lines like:

capture ready (...)
transcription ok (...)
reply ok (...)
tts ok (...)
playback start / playback done

The timestamps between those tell you exactly where the minute is going.

If you paste a small redacted log chunk from one slow turn (just the discord voice: lines), I can tell you whether it’s silence detection, STT, model, or playback queue.

#/vc with openclaw