Discord voice-message voice note transcription regression | Friends of the Crustacean 🦞🤝 | Page 1

whole inlet Apr 3, 2026, 3:15 PM

#

Hey folks - seeing what looks like a Discord voice-message transcription regression and wanted to check if anyone else is seeing it.

Version

OpenClaw 2026.4.2

Channel / surface

Discord

What used to work

I could send a native Discord voice message and OpenClaw would transcribe it and respond with normal text.

What happens now

The voice message arrives as an .ogg / audio/ogg; codecs=opus attachment, but no transcript gets injected into the agent flow.
Result: the bot sees the raw audio attachment instead of the transcribed text.

Important local findings

tools.media.audio.enabled is on
audio provider is configured (groq / whisper-large-v3-turbo)
the exact captured Discord .ogg file does transcribe successfully when run directly through OpenClaw’s transcription runtime
the same .ogg also successfully injects transcript text when passed through applyMediaUnderstanding(...) locally

So this does not look like:

Groq/Whisper failure
OGG/Opus incompatibility
audio-understanding being disabled

It looks more like a Discord inbound plumbing issue in the live path:
Discord voice message -> inbound media context -> media understanding -> transcript injection

Best guess
Something about real Discord native voice-message events may not be populating MediaPath / MediaType / MediaPaths / MediaTypes correctly, or the live inbound path is bypassing media understanding for IsVoiceMessage messages.

Timing

last clearly working transcript artifact I found: 2026-03-31
clearly broken by: 2026-04-01

Question

Is anyone else seeing Discord native voice messages arrive without transcript injection?
Has anyone traced whether IsVoiceMessage events are bypassing normal attachment/media handling in recent builds?

Related adjacent issue I found:

openclaw/openclaw#17101

regal waspBOT Apr 3, 2026, 3:15 PM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

whole inlet Apr 3, 2026, 3:16 PM

#

@tribal crow Any help would be appreciated!

whole inlet Apr 3, 2026, 4:00 PM

#

@tribal crow
Version
OpenClaw 2026.4.2,

Channel / surface
Discord,

What used to work
I could send a native Discord voice message and OpenClaw would transcribe it and respond with normal text.,

What happens now
The voice message arrives as an .ogg / audio/ogg; codecs=opus attachment, but no transcript gets injected into the agent flow.,
Result: the bot sees the raw audio attachment instead of the transcribed text.,

Important local findings
tools.media.audio.enabled is on,
audio provider is configured (groq / whisper-large-v3-turbo),
the exact captured Discord .ogg file does transcribe successfully when run directly through OpenClaw’s transcription runtime,
the same .ogg also successfully injects transcript text when passed through applyMediaUnderstanding(...) locally,

So this does not look like:
Groq/Whisper failure,
OGG/Opus incompatibility,
audio-understanding being disabled,

It looks more like a Discord inbound plumbing issue in the live path:
Discord voice message -> inbound media context -> media understanding -> transcript injection

Best guess
Something about real Discord native voice-message events may not be populating MediaPath / MediaType / MediaPaths / MediaTypes correctly, or the live inbound path is bypassing media understanding for IsVoiceMessage messages.

Timing
last clearly working transcript artifact I found: 2026-03-31,
clearly broken by: 2026-04-01,

Question
Is anyone else seeing Discord native voice messages arrive without transcript injection?,
Has anyone traced whether IsVoiceMessage events are bypassing normal attachment/media handling in recent builds?,

Related adjacent issue I found:
openclaw/openclaw#17101

whole inlet Apr 3, 2026, 4:59 PM

#

Any thoughts?

whole inlet Apr 3, 2026, 5:16 PM

#

@tribal crow A little more info after some more testing...

No transcript echo appeared, and your message still arrived here as a raw .ogg attachment.

So the temporary test strongly indicates:

the live Discord voice-message event is not reaching the audio media-understanding pipeline,
this is earlier than agent handling,
and earlier than transcript/body injection,

In other words, the break is probably in:

Discord inbound event -> attachment/media context creation -> audio media pipeline trigger

not in:
Groq/Whisper,
OGG/Opus transcription,
transcript injection logic itself,

What this gives us
This is now a much stronger bug report:
echoTranscript: true was enabled,
Discord native voice message sent,
no echo emitted,
same .ogg transcribes fine when run directly through OpenClaw runtime,
same .ogg injects fine through applyMediaUnderstanding(...),

That’s a pretty clean repro.

Recommendation
Next best move is a GitHub issue

tribal crowBOT Apr 3, 2026, 5:20 PM

#

Yeah — I think your read is correct. This looks like an inbound Discord regression, not a Whisper/provider problem.

Your evidence is pretty strong:

same .ogg transcribes fine in isolation
applyMediaUnderstanding(...) works on the same file
echoTranscript: true still emits nothing in live Discord path

That points to the handoff before media-understanding (attachment/media context construction), exactly where you narrowed it.

If you file (or update) the GitHub issue, I’d include these likely code touchpoints so maintainers can jump straight in:

extensions/discord/src/monitor/message-handler.process.ts (resolveMediaList → buildDiscordMediaPayload)
extensions/discord/src/monitor/message-utils.ts (appendResolvedMediaFromAttachments)
src/media-understanding/attachments.select.ts (audio candidate selection)

One concrete suspicion: native voice-message events may arrive with missing/odd content_type/filename metadata, so the attachment never qualifies as audio for capability selection and stays “raw media”.

#

Please open/update here: https://github.com/openclaw/openclaw/issues
(we can’t create issues directly from here, but your repro is excellent and should be actionable).

tribal crowBOT Apr 3, 2026, 8:04 PM

#

Absolutely — here’s a ready-to-paste GitHub issue draft.

Title
[Bug]: Discord native voice messages (.ogg/opus) not transcribed in live inbound flow (regression around 2026-04-01)

Body

### Summary

Discord native voice messages are arriving as `.ogg` / `audio/ogg; codecs=opus`, but transcript text is not injected into the agent flow.
The bot receives/handles raw media context instead of transcribed user text.

This appears to be a regression in the Discord inbound media pipeline, not a transcription-provider failure.

#

### OpenClaw version

- 2026.4.2

### Channel / surface

- Discord

### Regression window

- Last known working: 2026-03-31
- Broken by: 2026-04-01

### Steps to reproduce

#

1. Configure Discord channel normally.
2. Enable audio media understanding:
- `tools.media.audio.enabled: true`
- audio provider configured (tested with `groq` / `whisper-large-v3-turbo`)
3. (Optional for validation) enable:
- `tools.media.audio.echoTranscript: true`
4. Send a **native Discord voice message** (voice note).
5. Observe inbound handling and agent context.

### Expected behavior

- Voice note is recognized as audio input for media understanding.
- Transcript is injected into message context (`Transcript` / body path used by agent).
- With `echoTranscript: true`, transcript echo is sent before agent response.

#

### Actual behavior

- Message appears as raw `.ogg` attachment/media context.
- No transcript injection into the live agent flow.
- No transcript echo emitted even when `echoTranscript: true`.

### What was already validated (to rule out provider/audio issues)

- The exact captured Discord `.ogg` file transcribes successfully when run directly through OpenClaw transcription runtime.
- The same `.ogg` successfully injects transcript text when run through `applyMediaUnderstanding(...)` locally.
- Therefore this is likely in the live Discord inbound path before/at media context construction.

### Suspected area

Likely regression in:

#

`Discord voice message event -> inbound attachment/media context -> media understanding trigger`

Possible symptoms:
- Voice-message events not populating `MediaPath/MediaType/MediaPaths/MediaTypes` as expected
- Or `IsVoiceMessage` path bypassing normal attachment/media handling

### Related / adjacent issue

- #17101 (similar symptom pattern on Telegram, but this report is Discord-specific)

### Additional notes

Happy to provide logs or a minimal repro capture if maintainers want specific instrumentation points.

#Discord voice-message voice note transcription regression