Hey folks - seeing what looks like a Discord voice-message transcription regression and wanted to check if anyone else is seeing it.
Version
- OpenClaw
2026.4.2
Channel / surface
- Discord
What used to work
- I could send a native Discord voice message and OpenClaw would transcribe it and respond with normal text.
What happens now
- The voice message arrives as an
.ogg/audio/ogg; codecs=opusattachment, but no transcript gets injected into the agent flow. - Result: the bot sees the raw audio attachment instead of the transcribed text.
Important local findings
tools.media.audio.enabledis on- audio provider is configured (
groq/whisper-large-v3-turbo) - the exact captured Discord
.oggfile does transcribe successfully when run directly through OpenClaw’s transcription runtime - the same
.oggalso successfully injects transcript text when passed throughapplyMediaUnderstanding(...)locally
So this does not look like:
- Groq/Whisper failure
- OGG/Opus incompatibility
- audio-understanding being disabled
It looks more like a Discord inbound plumbing issue in the live path:
Discord voice message -> inbound media context -> media understanding -> transcript injection
Best guess
Something about real Discord native voice-message events may not be populating MediaPath / MediaType / MediaPaths / MediaTypes correctly, or the live inbound path is bypassing media understanding for IsVoiceMessage messages.
Timing
- last clearly working transcript artifact I found: 2026-03-31
- clearly broken by: 2026-04-01
Question
- Is anyone else seeing Discord native voice messages arrive without transcript injection?
- Has anyone traced whether
IsVoiceMessageevents are bypassing normal attachment/media handling in recent builds?
Related adjacent issue I found:
openclaw/openclaw#17101