#Whisper STT not transcribing voice messages (Telegram)
1 messages Ā· Page 1 of 1 (latest)
Youāre mixing up two different things:
-
āWhisper STT skillā (the skill you enabled in the Skills UI) is basically a callable skill/script. It does not automatically transcribe inbound Telegram voice notes just because itās enabled.
-
Automatic transcription of inbound voice notes is handled by Media Understanding ā
tools.media.audio. Thatās what turns a<media:audio>placeholder into an[Audio] ā¦block and sets{{Transcript}}. Docs: https://docs.openclaw.ai/nodes/audio and https://docs.openclaw.ai/nodes/media-understanding
What to check / fix
A) Make sure audio understanding is actually enabled + configured
Even though thereās auto-detect, Iād make it explicit in your config on Railway:
{
tools: {
media: {
audio: {
enabled: true,
echoTranscript: true, // TEMP: easiest way to confirm itās running
models: [{ provider: "openai", model: "gpt-4o-mini-transcribe" }],
},
},
},
}
(You can turn echoTranscript back off after it works.)
B) Telegram attachment size cap (very common gotcha)
Telegram inbound downloads are capped by:
channels.telegram.mediaMaxMb(default 5MB) ā if the voice note is bigger, OpenClaw wonāt download it, so thereās nothing to transcribe.
Docs: https://docs.openclaw.ai/channels/telegram (seemediaMaxMb)
Try bumping it (and keep tools.media.audio.maxBytes in mind too).
C) OPENAI_API_KEY must be available to the gateway process
Saving an API key inside the skill UI only affects skills.<skillId>... config, and media transcription does not read that. It uses normal provider auth (env/config).
So make sure Railway is actually injecting OPENAI_API_KEY into the running process (not just build-time).
D) Turn up logs so you can see ātranscription ran / skippedā
Set logging.consoleLevel: "debug" (or run the gateway in verbose). Docs: https://docs.openclaw.ai/logging
If you paste these two snippets from your config, I can tell you immediately whatās blocking it:
tools.media.audio(and/ortools.media)channels.telegram.mediaMaxMb
And if you can grab the log lines around receiving a voice note (with debug enabled), even better.
Config updated: added tools.media.audio.enabled: true and echoTranscript: true. Ran /restart. Still no transcription. OPENAI_API_KEY is set in Railway env vars. What else could be missing?
If echoTranscript: true still produces nothing, that usually means the audio transcription step never found an eligible model entry (so it skips silently unless youāre on debug/verbose).
Two common gotchas on Railway + Telegram:
1) You enabled audio, but you didnāt configure any audio STT model
tools.media.audio.enabled: true doesnāt magically pick OpenAI unless auto-detect kicks in and it can see a provider key. Make it explicit to force an STT attempt:
{
tools: {
media: {
audio: {
enabled: true,
echoTranscript: true,
models: [
{ provider: "openai", model: "gpt-4o-mini-transcribe" }
],
},
},
},
}
Docs: https://docs.openclaw.ai/nodes/audio
Also check you donāt have tools.media.models set to an image-only list (or tools.media.audio.models: [] from an old config migration) ā that can result in āno eligible audio modelā and therefore no STT.
2) Telegram never downloads the voice note (size cap)
Telegram inbound media download/processing is capped by channels.telegram.mediaMaxMb (default 5MB). If the voice note is bigger, it wonāt download ā no local file ā no transcription.
Set (temporarily) something higher and re-test with a short voice note:
{
channels: { telegram: { mediaMaxMb: 20 } }
}
Docs: https://docs.openclaw.ai/channels/telegram (search for mediaMaxMb)
3) Confirm the gateway can actually see OPENAI_API_KEY
Media transcription auth follows: auth-profiles.json ā env vars ā models.providers.*.apiKey. (So the āskill UI keyā is irrelevant here.)
On Railway, the fastest sanity check is: temporarily add to config (just to prove the point):
{ env: { OPENAI_API_KEY: "sk-..." } }
If that suddenly makes STT work, then Railway isnāt injecting the env var into the runtime process you think it is (wrong service, wrong environment, build-time vs run-time vars, etc.).
4) Turn logs up so you can see the skip reason
Set:
{ logging: { consoleLevel: "debug" } }
Then send a 2ā3 second voice note and paste the log lines around receipt. Docs: https://docs.openclaw.ai/logging
If you paste these sections from your current config, I can pinpoint it quickly:
tools.media(including anytools.media.modelsandtools.media.audio.scope)channels.telegram.mediaMaxMb
- whether this is a DM or a group (mention-gating can change behavior)
Š”ŠæŠ°ŃŠøŠ±Š¾ болŃŃŠ¾Šµ)) ŠŃŃ ŠæŠ¾Š»ŃŃŠøŠ»Š¾ŃŃ!!!