#Whisper STT not transcribing voice messages (Telegram)

1 messages Ā· Page 1 of 1 (latest)

indigo lichen
#

Whisper STT skill is enabled (eligible), OPENAI_API_KEY set in Railway env vars + saved in skill UI. Voice messages in Telegram arrive as media:audio with no transcription. Deploy logs show zero STT activity. Version: dev, hosted on Railway. What am I missing?

random laurelBOT
#

You’re mixing up two different things:

  1. ā€œWhisper STT skillā€ (the skill you enabled in the Skills UI) is basically a callable skill/script. It does not automatically transcribe inbound Telegram voice notes just because it’s enabled.

  2. Automatic transcription of inbound voice notes is handled by Media Understanding → tools.media.audio. That’s what turns a <media:audio> placeholder into an [Audio] … block and sets {{Transcript}}. Docs: https://docs.openclaw.ai/nodes/audio and https://docs.openclaw.ai/nodes/media-understanding

What to check / fix

A) Make sure audio understanding is actually enabled + configured
Even though there’s auto-detect, I’d make it explicit in your config on Railway:

{
  tools: {
    media: {
      audio: {
#
        enabled: true,
        echoTranscript: true, // TEMP: easiest way to confirm it’s running
        models: [{ provider: "openai", model: "gpt-4o-mini-transcribe" }],
      },
    },
  },
}

(You can turn echoTranscript back off after it works.)

B) Telegram attachment size cap (very common gotcha)
Telegram inbound downloads are capped by:

  • channels.telegram.mediaMaxMb (default 5MB) → if the voice note is bigger, OpenClaw won’t download it, so there’s nothing to transcribe.
    Docs: https://docs.openclaw.ai/channels/telegram (see mediaMaxMb)
#

Try bumping it (and keep tools.media.audio.maxBytes in mind too).

C) OPENAI_API_KEY must be available to the gateway process
Saving an API key inside the skill UI only affects skills.<skillId>... config, and media transcription does not read that. It uses normal provider auth (env/config).
So make sure Railway is actually injecting OPENAI_API_KEY into the running process (not just build-time).

D) Turn up logs so you can see ā€œtranscription ran / skippedā€
Set logging.consoleLevel: "debug" (or run the gateway in verbose). Docs: https://docs.openclaw.ai/logging


If you paste these two snippets from your config, I can tell you immediately what’s blocking it:

  • tools.media.audio (and/or tools.media)
  • channels.telegram.mediaMaxMb

And if you can grab the log lines around receiving a voice note (with debug enabled), even better.

indigo lichen
#

Config updated: added tools.media.audio.enabled: true and echoTranscript: true. Ran /restart. Still no transcription. OPENAI_API_KEY is set in Railway env vars. What else could be missing?

random laurelBOT
#

If echoTranscript: true still produces nothing, that usually means the audio transcription step never found an eligible model entry (so it skips silently unless you’re on debug/verbose).

Two common gotchas on Railway + Telegram:

1) You enabled audio, but you didn’t configure any audio STT model

tools.media.audio.enabled: true doesn’t magically pick OpenAI unless auto-detect kicks in and it can see a provider key. Make it explicit to force an STT attempt:

{
  tools: {
    media: {
      audio: {
        enabled: true,
        echoTranscript: true,
        models: [
          { provider: "openai", model: "gpt-4o-mini-transcribe" }
#
        ],
      },
    },
  },
}

Docs: https://docs.openclaw.ai/nodes/audio

Also check you don’t have tools.media.models set to an image-only list (or tools.media.audio.models: [] from an old config migration) — that can result in ā€œno eligible audio modelā€ and therefore no STT.

2) Telegram never downloads the voice note (size cap)

Telegram inbound media download/processing is capped by channels.telegram.mediaMaxMb (default 5MB). If the voice note is bigger, it won’t download → no local file → no transcription.

Set (temporarily) something higher and re-test with a short voice note:

#
{
  channels: { telegram: { mediaMaxMb: 20 } }
}

Docs: https://docs.openclaw.ai/channels/telegram (search for mediaMaxMb)

3) Confirm the gateway can actually see OPENAI_API_KEY

Media transcription auth follows: auth-profiles.json → env vars → models.providers.*.apiKey. (So the ā€œskill UI keyā€ is irrelevant here.)

On Railway, the fastest sanity check is: temporarily add to config (just to prove the point):

{ env: { OPENAI_API_KEY: "sk-..." } }
#

If that suddenly makes STT work, then Railway isn’t injecting the env var into the runtime process you think it is (wrong service, wrong environment, build-time vs run-time vars, etc.).

4) Turn logs up so you can see the skip reason

Set:

{ logging: { consoleLevel: "debug" } }

Then send a 2–3 second voice note and paste the log lines around receipt. Docs: https://docs.openclaw.ai/logging


If you paste these sections from your current config, I can pinpoint it quickly:

  • tools.media (including any tools.media.models and tools.media.audio.scope)
  • channels.telegram.mediaMaxMb
#
  • whether this is a DM or a group (mention-gating can change behavior)
indigo lichen
#

Дпасибо большое)) Всё ŠæŠ¾Š»ŃƒŃ‡ŠøŠ»Š¾ŃŃŒ!!!