Discord Voice STT failing on silent audio chunks — Audio transcription response missing transcript | Friends of the Crustacean 🦞🤝 | Page 1

amber temple Apr 15, 2026, 3:02 PM

#

OpenClaw v2026.4.12

Issue:
Discord voice STT is broken. OpenClaw processes every tiny audio chunk (including silence) individually via Deepgram. Silent chunks return an empty transcript, which triggers requireTranscriptionText to throw "Audio transcription response missing transcript", which kills the whole voice session.
Evidence:

Deepgram nova-3 works correctly when tested manually against the WAV files in /tmp/openclaw/discord-voice-*/
Short silent chunks (0.1–0.6s) return empty transcript → error
Longer speech chunks (2MB+) return correct transcription when tested manually
OpenClaw never processes the longer chunks because it fails on the silent ones first

Config:
json"tools": {
"media": {
"audio": {
"enabled": true,
"models": [{"provider": "deepgram", "model": "nova-3"}]
}
}
}
Request:
Can OpenClaw add silence filtering / minimum duration threshold before sending audio chunks to STT? Or a config option to skip empty transcript results rather than throwing an error?
Also: ElevenLabs TTS shows no provider registered despite being in plugins.allow and plugins.entries with enabled: true. Only loads when added via plugins.entries directly. Is this expected?

fickle forgeBOT Apr 15, 2026, 3:10 PM

#

I checked the v2026.4.12 tag specifically.

Short version:

Your Deepgram STT report looks real.
There is already a minimum duration / size guard, but not silence filtering.
There is no config I could find for “skip empty transcript” or Discord voice min-duration tuning.
For ElevenLabs, plugins.allow alone is not enough. Explicit enablement is expected today.

What I found:

In extensions/discord/src/voice/manager.ts, Discord voice already drops very short captures with:
- MIN_SEGMENT_SECONDS = 0.35
- aborted streams use 0.2s
In generic audio transcription, there is also:
- MIN_AUDIO_FILE_BYTES = 1024
- enforced in src/media-understanding/runner.entries.ts

#

But that still does not do silence/VAD filtering. A short silent WAV can easily be bigger than 1024 bytes, so it still gets sent to STT.

Then in extensions/deepgram/audio.ts, empty Deepgram output is treated as a hard error:

requireTranscriptionText(..., "Audio transcription response missing transcript")

So in v2026.4.12, I do not see a “blank transcript = skip” path for Deepgram. The Discord voice manager has an if (!transcript) return; branch, but Deepgram throws before that branch can be reached.

One nuance, from source this looks like a segment-processing failure, not a literal voice disconnect. The queue catches and logs discord voice: processing failed: ..., so it should continue processing later segments. Still, the empty-transcript behavior is clearly bad and noisy.

For ElevenLabs:

plugins.allow is an allowlist, not an enable switch.
extensions/elevenlabs/openclaw.plugin.json does not set enabledByDefault: true.
I also do not see auto-enable logic for messages.tts.providers.elevenlabs / talk.providers.elevenlabs.

So yes, needing plugins.entries.elevenlabs.enabled: true (or openclaw plugins enable elevenlabs) is expected today. plugins.allow: ["elevenlabs"] by itself only makes it eligible to load.

#

Docs:

This feels worth filing as a bug/feature request:

skip empty STT results instead of throwing, and/or
add silence/VAD filtering before Discord voice chunks hit STT

Issue tracker: https://github.com/openclaw/openclaw/issues

If you want, I can help turn this into a tight GitHub issue with the exact file/function references.

amber temple Apr 15, 2026, 5:40 PM

#

Thanks King @fickle forge 🦐

I will look through all of this now what would you suggest as best setup for conversational agent on the go. I had no issues setting Groq + EL up on Slack but chose to move to Discord due to cost implications in the future.

fickle forgeBOT Apr 15, 2026, 5:41 PM

#

Glad that helped 🦐

If I were setting up an on-the-go conversational agent today, I’d choose between these:

1. Best overall, most reliable: Discord text + ElevenLabs replies

Use Discord as the chat surface
Keep messages.tts.auto on inbound or tagged, so TTS only fires when useful
Do not make Discord live voice your primary path on v2026.4.12 yet
Explicitly enable ElevenLabs with plugins.entries.elevenlabs.enabled=true

2. Best voice UX, if you already have it: Talk mode + ElevenLabs

This is the cleaner continuous voice loop, with silence-window and interrupt behavior built in
Docs: https://docs.openclaw.ai/nodes/talk

3. Cheapest experimental voice path: Discord voice + Deepgram + ElevenLabs

Works in principle, but I’d treat it as beta until the empty-transcript handling gets fixed
Right now silent chunks can still poison the STT flow

#

So my honest recommendation is:

Discord for messaging
ElevenLabs for TTS
A tool-reliable cloud model for the agent
Avoid Discord live voice as your main mobile loop for now

For current model picks, check:

https://docs.openclaw.ai/concepts/models
and ask in #models, since that changes fast

Useful docs:

Discord: https://docs.openclaw.ai/channels/discord
TTS: https://docs.openclaw.ai/tools/tts
Plugins: https://docs.openclaw.ai/cli/plugins
Models: https://docs.openclaw.ai/concepts/models

#

If you want, I can sketch a minimal “best today” Discord + ElevenLabs config next.

amber temple Apr 15, 2026, 5:44 PM

#

fickle forge If you want, I can sketch a **minimal “best today” Discord + ElevenLabs config**...

yes please

fickle forgeBOT Apr 15, 2026, 5:45 PM

#

Yep, here’s the minimal “best today” setup I’d use for Discord text + ElevenLabs replies, not Discord live voice:

{
  plugins: {
    allow: ["discord", "elevenlabs"],
    entries: {
      elevenlabs: {
        enabled: true,
      },
    },
  },

  channels: {
    discord: {
      enabled: true,

#

      voice: {
        enabled: false, // I’d keep live Discord voice off for now on v2026.4.12
      },
    },
  },

  messages: {
    tts: {
      auto: "inbound", // or "tagged" if you want tighter cost control
      provider: "elevenlabs",
      providers: {
        elevenlabs: {
          voiceId: "YOUR_ELEVENLABS_VOICE_ID",
        },
      },

#

    },
  },
}

A couple notes:

Set ELEVENLABS_API_KEY (or XI_API_KEY) in the gateway environment
plugins.allow makes ElevenLabs eligible, but plugins.entries.elevenlabs.enabled=true is the bit that actually turns it on
auto: "inbound" is a nice middle ground for mobile use
If you want the cheapest behavior, use auto: "tagged"

If you want to experiment with Discord voice later, keep your existing Deepgram block, then flip:

#

channels.discord.voice.enabled: true

But honestly, I’d treat that as testing-only until the silent-chunk STT path is fixed.

Docs:

Discord: https://docs.openclaw.ai/channels/discord
TTS: https://docs.openclaw.ai/tools/tts
Plugins: https://docs.openclaw.ai/cli/plugins

If you want, next I can sketch the same setup but with Discord voice enabled for testing.

amber temple Apr 15, 2026, 5:48 PM

#

all good thank you I will implement this clean approach now

#Discord Voice STT failing on silent audio chunks — Audio transcription response missing transcript