OpenClaw v2026.4.12
Issue:
Discord voice STT is broken. OpenClaw processes every tiny audio chunk (including silence) individually via Deepgram. Silent chunks return an empty transcript, which triggers requireTranscriptionText to throw "Audio transcription response missing transcript", which kills the whole voice session.
Evidence:
Deepgram nova-3 works correctly when tested manually against the WAV files in /tmp/openclaw/discord-voice-*/
Short silent chunks (0.1–0.6s) return empty transcript → error
Longer speech chunks (2MB+) return correct transcription when tested manually
OpenClaw never processes the longer chunks because it fails on the silent ones first
Config:
json"tools": {
"media": {
"audio": {
"enabled": true,
"models": [{"provider": "deepgram", "model": "nova-3"}]
}
}
}
Request:
Can OpenClaw add silence filtering / minimum duration threshold before sending audio chunks to STT? Or a config option to skip empty transcript results rather than throwing an error?
Also: ElevenLabs TTS shows no provider registered despite being in plugins.allow and plugins.entries with enabled: true. Only loads when added via plugins.entries directly. Is this expected?