#telegram STT

1 messages · Page 1 of 1 (latest)

outer pike
#

STT in telegram does not work.
Here is my config:

"tools": {
"media": {
"audio": {
"enabled": true,
"models": [
{
"provider": "groq",
"model": "whisper-large-v3-turbo"
}
]
}
},
...

Here is the message from web UI:
[media attached: /home/kira/.openclaw/media/inbound/voice_20260406_134222_tmp---230e6622-8d51-4ff7-9b6c-7bcdcce44858.wav (audio/wav) | /home/kira/.openclaw/media/inbound/voice_20260406_134222_tmp---230e6622-8d51-4ff7-9b6c-7bcdcce44858.wav]
To send an image back, prefer the message tool (media/path/filePath). If you must inline, use MEDIA:https://example.com/image.jpg (spaces ok, quote if needed) or a safe relative path like MEDIA:./image.jpg. Avoid absolute paths (MEDIA:/...) and ~ paths - they are blocked for security. Keep caption in the text body.

media:audio

The key is in place, manual STT works fine.
Help me Krill

devout trailBOT
#

Yep — that Web UI snippet is a strong clue: the audio file is being received, but transcription isn’t being applied (you’d normally see an [Audio] Transcript: block instead of raw media:audio).

Quickest path to isolate it:

  1. What version are you on?
openclaw --version
  1. Use the simplest Groq audio config (provider only)
{
tools: {
media: {
audio: {
enabled: true,
#
models: [{ provider: "groq" }]
}
}
}
}

(Groq’s default transcription model is auto-selected.)

  1. Make sure the daemon sees the key (most common gotcha)
    If running as service, shell env often isn’t inherited. Put key in ~/.openclaw/.env or config env, then:
openclaw gateway restart
openclaw models status
  1. Check runtime logs while sending one voice note
outer pike
#

kira@nikjitsi:~$ openclaw --version
OpenClaw 2026.4.5 (3e72c03)

11:05:10+00:00 info gateway/reload {"subsystem":"gateway/reload"} config hot reload applied (models.providers.openai.apiKey, models.providers.groq.apiKey, models.providers.openrouter.apiKey, models.providers.minimax.apiKey, models.providers.kimi-coding.apiKey, channels.discord.token)
11:05:10+00:00 info gateway/channels/discord {"subsystem":"gateway/channels/discord"} discord channels resolved: guild:1466820492777361675→1466820492777361675 (guild:Fox in a box)
11:05:13+00:00 info gateway/channels/discord {"subsystem":"gateway/channels/discord"} discord client initialized as 1466887132298608806 (Kira); awaiting gateway readiness
11:05:46+00:00 debug cron {"module":"cron","storePath":"/home/kira/.openclaw/cron/jobs.json"} {"nextAt":1775530800000,"delayMs":60000,"clamped":true} cron: timer armed
11:06:46+00:00 debug cron {"module":"cron","storePath":"/home/kira/.openclaw/cron/jobs.json"} {"nextAt":1775530800000,"delayMs":60000,"clamped":true} cron: timer armed
11:07:26+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1631ms conn=5d350849…fa62 id=112dccae…5aae
11:07:32+00:00 info exec {"subsystem":"exec"} elevated command curl -sS https://api.groq.com/openai/v1/audio/transcriptio...d-aa27-4dd676440f5c.wav" -F "model=whisper-large-v3-turbo"
11:07:33+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1343ms conn=5d350849…fa62 id=456e0f10…6df2
11:07:38+00:00 info gateway/ws {"subsystem":"gateway/ws"} ⇄ res ✓ sessions.list 1465ms conn=5d350849…fa62 id=c6f5d236…be6a
11:07:46+00:00 debug cron {"module":"cron","storePath":"/home/kira/.openclaw/cron/jobs.json"} {"nextAt":1775530800000,"delayMs":60000,"clamped":true} cron: timer armed

  • groq effective=models.json:gsk_dfFg...gHCaMidU | models.json=gsk_dfFg...gHCaMidU | source=models.json: ~/.openclaw/agents/main/agent/models.json
devout trailBOT
#

Perfect, this is useful — your version and Groq key resolution look good.
So this is not a basic auth issue.

At this point it’s most likely one of these:

  1. audio pipeline is being skipped (scope/size/policy), or
  2. Groq media provider isn’t active in runtime (plugin/load issue), or
  3. transcription fails silently unless verbose logging is on.

Run these and paste output:

openclaw config get tools.media.audio --json
openclaw config get tools.media.audio.scope --json
openclaw plugins list --enabled | grep -i groq
openclaw channels logs --channel telegram
#

Then do one hard proof test (super useful):

openclaw config set tools.media.audio.echoTranscript true
openclaw gateway restart

Send a short Telegram voice note.

  • If transcript echo appears: STT works, issue is later in routing/prompt handling.
  • If no echo: STT step itself is failing/skipping.

Also check voice file size once (OpenClaw skips tiny audio under 1024 bytes and default cap is 20MB):

stat -c%s /home/kira/.openclaw/media/inbound/voice_*.wav | tail -1

If you paste those 4 command outputs + one failed voice-note log chunk, I can pinpoint the exact fix.

outer pike
#

kira@nikjitsi:~$ openclaw config get tools.media.audio --json
{
"enabled": true,
"models": [
{
"provider": "groq"
}
]
}
kira@nikjitsi:~$ openclaw config get tools.media.audio.scope --json
Config path not found: tools.media.audio.scope
kira@nikjitsi:~$ openclaw plugins list --enabled | grep -i groq
kira@nikjitsi:~$ openclaw channels logs --channel telegram

🦞 OpenClaw 2026.4.5 (3e72c03) — I've survived more breaking changes than your last three relationships.

Log file: /tmp/openclaw/openclaw-2026-04-06.log
Channel: telegram

...
2026-04-06T10:38:52.602+00:00 info {"subsystem":"gateway/channels/telegram"} Telegram menu text exceeded the conservative 5700-character payload budget; shortening descriptions to keep 60 commands visible.
2026-04-06T10:41:19.483+00:00 info {"subsystem":"gateway/channels/telegram"} [default] starting provider (@clawd_kira_bot)
2026-04-06T10:41:20.233+00:00 info {"subsystem":"gateway/channels/telegram"} Telegram menu text exceeded the conservative 5700-character payload budget; shortening descriptions to keep 60 commands visible.
2026-04-06T10:50:32.177+00:00 info {"subsystem":"gateway/channels/telegram"} [default] starting provider (@clawd_kira_bot)
2026-04-06T10:50:32.759+00:00 info {"subsystem":"gateway/channels/telegram"} Telegram menu text exceeded the conservative 5700-character payload budget; shortening descriptions to keep 60 commands visible.
2026-04-06T10:50:39.084+00:00 info {"subsystem":"gateway/channels/telegram"} telegram sendMessage ok chat=132841727 message=11982

devout trailBOT
#

Nice, we found it.
This line is the smoking gun:

openclaw plugins list --enabled | grep -i groq
# (no output)

Your tools.media.audio config is correct, but the Groq media provider plugin is not active, so audio transcription has no runtime handler and you get raw <media:audio>.

Fix

openclaw plugins enable groq
openclaw gateway restart
openclaw plugins list --enabled | grep -i groq
#

Then test a Telegram voice note again.


If enable groq fails or still doesn’t show up, check these overrides:

openclaw config get plugins.enabled --json
openclaw config get plugins.allow --json
openclaw config get plugins.entries.groq --json
openclaw plugins inspect groq

Most likely culprit is plugins.allow excluding groq, or an explicit plugins.entries.groq.enabled: false.

Docs:

outer pike
#

kira@nikjitsi:~$ openclaw config get plugins.enabled --json
openclaw config get plugins.allow --json
openclaw config get plugins.entries.groq --json
openclaw plugins inspect groq
Config path not found: plugins.enabled
[
"telegram",
"discord",
"device-pair",
"minimax",
"openai",
"openrouter",
"browser",
"groq"
]
{
"enabled": true,
"config": {}
}

🦞 OpenClaw 2026.4.5 (3e72c03) — iMessage green bubble energy, but for everyone.

Groq Media Understanding
id: groq
Bundled Groq audio transcription provider

Status: loaded
Format: openclaw
Source: ~/.local/lib/node_modules/openclaw/dist/extensions/groq/index.js
Origin: bundled
Version: 2026.4.5
Shape: plain-capability
Capability mode: plain
Legacy before_agent_start: no

Capabilities:
media-understanding: groq

devout trailBOT
#

Nice catch — you’re right, Groq plugin is loaded/enabled.
So scratch my previous plugin theory.

Given your setup, the next most likely cause is audio being skipped before provider call (size/tiny/silent skip), and channel logs won’t show that detail.

Try this exact test:

# 1) increase audio cap + force visible confirmation
openclaw config set tools.media.audio.maxBytes 52428800 --strict-json
openclaw config set tools.media.audio.echoTranscript true --strict-json
openclaw gateway restart

# 2) check inbound file sizes
ls -lh ~/.openclaw/media/inbound/voice_* | tail -n 10
#

Then send one Telegram voice note and tail full gateway logs (not channel-only):

openclaw logs --follow | grep -Ei "audio-preflight|media understanding|transcrib|groq|too small|maxBytes|provider not available|failed"

Why this matters:

  • default audio cap is 20MB
  • files under 1024 bytes are skipped
  • these skips often leave exactly the <media:audio> behavior you’re seeing

If it still fails, paste ~30 lines around that one message and I’ll pinpoint the exact branch.
Ref: https://docs.openclaw.ai/nodes/audio

outer pike
#

nope, nothing. In groq logs I can see this error:
"error": {
"message": "request Content-Type isn't multipart/form-data",
"type": "invalid_request_error",
"param": "",
"code": ""

devout trailBOT
#

Yep — that Groq error is the key.
"request Content-Type isn't multipart/form-data" means OpenClaw is sending the transcription request with a bad Content-Type header override (usually application/json).

For /audio/transcriptions, you must not force Content-Type. Fetch/FormData needs to set it (with boundary) automatically.

Check and remove header overrides

openclaw config get models.providers.groq.headers --json
openclaw config get tools.media.audio.request.headers --json
openclaw config get tools.media.audio.models[0].headers --json

If you see content-type / Content-Type, remove it:

#
openclaw config unset 'models.providers.groq.headers["content-type"]'
openclaw config unset 'models.providers.groq.headers["Content-Type"]'
openclaw config unset 'tools.media.audio.request.headers["content-type"]'
openclaw config unset 'tools.media.audio.models[0].headers["content-type"]'
openclaw gateway restart

(If some paths don’t exist, that’s fine.)

Optional sanity check:

openclaw config get models.providers.groq.baseUrl --json

Should be default Groq OpenAI base (https://api.groq.com/openai/v1) or unset.

After that, retry one voice note — this should unblock STT.

outer pike
#

kira@nikjitsi:~$ openclaw config get models.providers.groq.headers --json
openclaw config get tools.media.audio.request.headers --json
openclaw config get tools.media.audio.models[0].headers --json
Config path not found: models.providers.groq.headers
Config path not found: tools.media.audio.request.headers
Config path not found: tools.media.audio.models[0].headers
kira@nikjitsi:~$ openclaw config get models.providers.groq.baseUrl --json
"https://api.groq.com/openai/v1"

devout trailBOT
#

Got it — thanks, this narrows it a lot.

Since:

  • no header overrides exist
  • base URL is correct
  • Groq reports “Content-Type isn’t multipart/form-data”

the likely culprit is proxy transport (env proxy path), not your media config.

Quick A/B check (this usually confirms it)

openclaw config get env.HTTPS_PROXY --json
openclaw config get env.HTTP_PROXY --json
openclaw config get env.https_proxy --json
openclaw config get env.http_proxy --json
#

If any are set, temporarily unset and restart:

openclaw config unset env.HTTPS_PROXY
openclaw config unset env.HTTP_PROXY
openclaw config unset env.https_proxy
openclaw config unset env.http_proxy
openclaw gateway restart

Then test one Telegram voice note again.


If STT works after unsetting proxy env, that’s a real bug path (multipart getting mangled in proxied fetch).
Please file it here so maintainers can patch it: https://github.com/openclaw/openclaw/issues

#

As a temporary workaround if you must keep proxy env globally, use a CLI fallback in tools.media.audio.models (or another provider path that works in your environment).

outer pike
#

kira@nikjitsi:~$ openclaw config get env.HTTPS_PROXY --json
openclaw config get env.HTTP_PROXY --json
openclaw config get env.https_proxy --json
openclaw config get env.http_proxy --json
Config path not found: env.HTTPS_PROXY
Config path not found: env.HTTP_PROXY
Config path not found: env.https_proxy
Config path not found: env.http_proxy
kira@nikjitsi:~$

devout trailBOT
#

Nice work digging — this now looks like a runtime bug, not config.

You’ve ruled out:

  • bad audio config
  • missing plugin
  • bad base URL
  • explicit header overrides
  • config-level proxy env

and Groq still sees non-multipart requests from OpenClaw.

Practical workaround right now (keep Groq, bypass broken transport path)

Add a CLI fallback entry that uses the same curl flow you already confirmed works:

#
{
tools: {
media: {
audio: {
enabled: true,
models: [
{ provider: "groq" },
{
type: "cli",
command: "bash",
args: [
"-lc",
"curl -sS https://api.groq.com/openai/v1/audio/transcriptions -H \"Authorization: Bearer $GROQ_API_KEY\" -F \"file=@{{MediaPath}}\" -F \"model=whisper-large-v3-turbo\" | jq -r '.text'"
],
timeoutSeconds: 60