#Telegram voice

1 messages · Page 1 of 1 (latest)

sick mesa
#

I've noticed on telegram there is an option to record audio. Is it possible to get openclaw to interpret what is being said in the audio?

lavish sageBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

torn groveBOT
#

Yes — OpenClaw can transcribe inbound audio/voice notes and use the transcript as the message the agent responds to.

It’s controlled by tools.media.audio. By default, if you haven’t disabled it, OpenClaw tries to auto-detect a usable transcription path: audio-capable reply model, local Whisper/sherpa CLI, Gemini CLI, or provider auth like OpenAI/Groq/Deepgram/Google/Mistral.

Example explicit config:

{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [{ provider: "openai", model: "gpt-4o-mini-transcribe" }],
        // optional: echo the transcript back before the agent reply
        // echoTranscript: true,
      },
    },
sick mesa
#

I'm running ai locally, currently qwen3.6

torn groveBOT
#

Qwen as your chat model is fine, but I wouldn’t rely on Qwen 3.x itself for Telegram voice-note transcription. For local setups, the clean pattern is:

  • keep Qwen as the reply/agent model
  • add a separate local speech-to-text path for audio, usually whisper-cli / whisper.cpp or Python whisper
  • OpenClaw transcribes the voice note first, then sends the transcript to Qwen as normal text

Example:

{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          {
            type: "cli",
#
            command: "whisper",
            args: ["--model", "base", "{{MediaPath}}"],
            timeoutSeconds: 45,
          },
        ],
      },
    },
  },
}

If you use whisper.cpp, use whisper-cli instead and point it at a model via WHISPER_CPP_MODEL or configure the full command path.

Docs: https://docs.openclaw.ai/nodes/audio

Small warning from experience: local LLMs can be shaky at tool calling, but this particular flow is less demanding because Qwen only sees the final transcript. The transcription piece is the important part to get reliable.

sick mesa
#

so whisper is cli that runs locally?

torn groveBOT
#

Yep. In this context “Whisper” usually means a local speech-to-text CLI that OpenClaw shells out to before the LLM sees the message.

There are two common local options:

  • whisper — Python/OpenAI Whisper CLI. Easy, can download models automatically, usually slower/heavier.
  • whisper-cli — from whisper.cpp. Fast/local C++ version; you download a .bin/GGUF-ish Whisper model and point the CLI at it.

OpenClaw’s audio pipeline can call either. Flow is:

Telegram voice note → OpenClaw downloads audio → local Whisper transcribes → transcript gets sent to Qwen as text.

So Qwen doesn’t need to understand audio directly. It just answers the transcript.

sick mesa
#

So what would be the steps to install whisper

torn groveBOT
#

I’d do whisper.cpp for a local OpenClaw box. It’s lightweight and gives you the whisper-cli binary OpenClaw auto-detects.

Generic steps:

  1. Install build deps + ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install -y git cmake build-essential ffmpeg

macOS:

brew install cmake ffmpeg
#
  1. Build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp ~/whisper.cpp
cd ~/whisper.cpp
cmake -B build
cmake --build build --config Release
  1. Download a model

Start with base.en for English, or base for multilingual:

cd ~/whisper.cpp
./models/download-ggml-model.sh base.en

That should create something like:

#
~/whisper.cpp/models/ggml-base.en.bin
  1. Test it manually
~/whisper.cpp/build/bin/whisper-cli \
  -m ~/whisper.cpp/models/ggml-base.en.bin \
  -f ~/whisper.cpp/samples/jfk.wav
  1. Wire it into OpenClaw config

Use absolute paths, especially if OpenClaw runs as a daemon/service:

{
#
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          {
            type: "cli",
            command: "/home/YOU/whisper.cpp/build/bin/whisper-cli",
            args: [
              "-m", "/home/YOU/whisper.cpp/models/ggml-base.en.bin",
              "-otxt", "-of", "{{OutputBase}}",
              "-np", "-nt",
              "{{MediaPath}}"
            ],
            timeoutSeconds: 60,
#
          },
        ],
      },
    },
  },
}

Then restart the gateway.

Docs: https://docs.openclaw.ai/nodes/audio

If you tell me your OS + whether OpenClaw is installed from npm/source/Docker, I can tailor the exact paths. For local Qwen, this is the right shape: Whisper transcribes locally, Qwen replies to the transcript.

sick mesa
#

There are no quotes in that openclaw config

torn groveBOT
#

Right — OpenClaw config is JSON5, so object keys often don’t need quotes:

{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          {
            type: "cli",
            command: "/home/YOU/whisper.cpp/build/bin/whisper-cli",
            args: [
              "-m", "/home/YOU/whisper.cpp/models/ggml-base.en.bin",
              "-otxt", "-of", "{{OutputBase}}",
              "-np", "-nt",
#
              "{{MediaPath}}"
            ],
            timeoutSeconds: 60,
          },
        ],
      },
    },
  },
}

The string values still need quotes though — paths, cli, args, {{MediaPath}}, etc. If your existing config style has unquoted keys, that’s totally normal.

sick mesa
#

Everything in openclaw.json has quotes

torn groveBOT
#

Yep, fair catch — if your openclaw.json is written as strict JSON style, quote the keys too. Use this shape:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "/home/YOU/whisper.cpp/build/bin/whisper-cli",
            "args": [
              "-m",
              "/home/YOU/whisper.cpp/models/ggml-base.en.bin",
              "-otxt",
#
              "-of",
              "{{OutputBase}}",
              "-np",
              "-nt",
              "{{MediaPath}}"
            ],
            "timeoutSeconds": 60
          }
        ]
      }
    }
  }
}

Replace /home/YOU/... with your actual absolute paths. Also: no comments or trailing commas if your parser is treating it as strict JSON.

sick mesa
#

Thats working. Can I get it to output what it thinks I said?

torn groveBOT
#

Yep — enable echoTranscript in the same audio config. OpenClaw will send the transcript back before the agent processes/replies.

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "echoTranscript": true,
        "echoFormat": "📝 I heard: {transcript}",
        "models": [
          {
            "type": "cli",
            "command": "/home/YOU/whisper.cpp/build/bin/whisper-cli",
            "args": [
              "-m",
#
              "/home/YOU/whisper.cpp/models/ggml-base.en.bin",
              "-otxt",
              "-of",
              "{{OutputBase}}",
              "-np",
              "-nt",
              "{{MediaPath}}"
            ],
            "timeoutSeconds": 60
          }
        ]
      }
    }
  }
}
sick mesa
#

openclaw understood the audio but came back with this message
I got your audio message but the transcription came back empty — could you type out what you said? I want to make sure I don't miss anything.

torn groveBOT
#

That sounds like OpenClaw saw the audio attachment, but the transcription result itself was empty. Two quick things to separate:

  1. If you did not see your 📝 I heard: ... echo message, then either echoTranscript config wasn’t loaded, or the transcript was empty so there was nothing to echo.

  2. The reply you got is probably your agent reacting to an [Audio] message with an empty/missing transcript.

First: what does this show?

openclaw --version

Then I’d check these:

openclaw logs --follow
#

Send another Telegram voice note and look for audio/media transcription lines or whisper errors.

Also test Whisper directly against a real audio file if you can:

/home/YOU/whisper.cpp/build/bin/whisper-cli \
  -m /home/YOU/whisper.cpp/models/ggml-base.en.bin \
  -f /path/to/audio.ogg

If direct Whisper works but OpenClaw is empty, it’s likely the config args/path. If direct Whisper is also empty, try a better model (small.en instead of base.en) or add language forcing:

"args": [
  "-m", "/home/YOU/whisper.cpp/models/ggml-small.en.bin",
  "-l", "en",
#
  "-otxt", "-of", "{{OutputBase}}",
  "-np", "-nt",
  "{{MediaPath}}"
]

Important: after changing openclaw.json, restart the gateway. The echo should appear as a separate message before the agent reply when the transcript is non-empty.

sick mesa
#

Command failed: /home/YOU/whisper.cpp/build/bin/whisper-cli -m /home/YOU/whisper.cpp/models/ggml-base.en.bin -otxt -of /tmp/openclaw/openclaw-media-cli-XYVVKg/file_9---22b2de4e-7359-41f1-99d0-8ba6678cee68 -np -nt /tmp/openclaw/openclaw-media-cli-XYVVKg/file_9---22b2de4e-7359-41f1-99d0-8ba6678cee68.wav

torn groveBOT
#

Ah, there’s the bug: the config still has my placeholder paths in it:

/home/YOU/whisper.cpp/...

That needs to be your real home path / real binary path. For example, if your Linux username is rex, it might be:

"command": "/home/rex/whisper.cpp/build/bin/whisper-cli"

and:

"/home/rex/whisper.cpp/models/ggml-base.en.bin"
#

Find the real paths with:

find "$HOME/whisper.cpp" -name whisper-cli -type f
find "$HOME/whisper.cpp/models" -name 'ggml-*.bin' -type f

Then update those two values in openclaw.json, restart the gateway, and try the voice note again.

If find returns nothing for whisper-cli, the build may have put it somewhere else; try:

find "$HOME" -name whisper-cli -type f 2>/dev/null | head