#Telegram voice
1 messages · Page 1 of 1 (latest)
Yes — OpenClaw can transcribe inbound audio/voice notes and use the transcript as the message the agent responds to.
It’s controlled by tools.media.audio. By default, if you haven’t disabled it, OpenClaw tries to auto-detect a usable transcription path: audio-capable reply model, local Whisper/sherpa CLI, Gemini CLI, or provider auth like OpenAI/Groq/Deepgram/Google/Mistral.
Example explicit config:
{
tools: {
media: {
audio: {
enabled: true,
models: [{ provider: "openai", model: "gpt-4o-mini-transcribe" }],
// optional: echo the transcript back before the agent reply
// echoTranscript: true,
},
},
},
}
When transcription works, CommandBody/RawBody become the transcript, so spoken slash commands can work too. In Telegram groups, OpenClaw can also do a preflight transcript before mention detection, so a voice note that says the bot’s name can pass requireMention.
Docs:
https://docs.openclaw.ai/nodes/audio
https://docs.openclaw.ai/channels/telegram
I'm running ai locally, currently qwen3.6
Qwen as your chat model is fine, but I wouldn’t rely on Qwen 3.x itself for Telegram voice-note transcription. For local setups, the clean pattern is:
- keep Qwen as the reply/agent model
- add a separate local speech-to-text path for audio, usually
whisper-cli/whisper.cppor Pythonwhisper - OpenClaw transcribes the voice note first, then sends the transcript to Qwen as normal text
Example:
{
tools: {
media: {
audio: {
enabled: true,
models: [
{
type: "cli",
command: "whisper",
args: ["--model", "base", "{{MediaPath}}"],
timeoutSeconds: 45,
},
],
},
},
},
}
If you use whisper.cpp, use whisper-cli instead and point it at a model via WHISPER_CPP_MODEL or configure the full command path.
Docs: https://docs.openclaw.ai/nodes/audio
Small warning from experience: local LLMs can be shaky at tool calling, but this particular flow is less demanding because Qwen only sees the final transcript. The transcription piece is the important part to get reliable.
so whisper is cli that runs locally?
Yep. In this context “Whisper” usually means a local speech-to-text CLI that OpenClaw shells out to before the LLM sees the message.
There are two common local options:
whisper— Python/OpenAI Whisper CLI. Easy, can download models automatically, usually slower/heavier.whisper-cli— fromwhisper.cpp. Fast/local C++ version; you download a.bin/GGUF-ish Whisper model and point the CLI at it.
OpenClaw’s audio pipeline can call either. Flow is:
Telegram voice note → OpenClaw downloads audio → local Whisper transcribes → transcript gets sent to Qwen as text.
So Qwen doesn’t need to understand audio directly. It just answers the transcript.
So what would be the steps to install whisper
I’d do whisper.cpp for a local OpenClaw box. It’s lightweight and gives you the whisper-cli binary OpenClaw auto-detects.
Generic steps:
- Install build deps + ffmpeg
Ubuntu/Debian:
sudo apt update
sudo apt install -y git cmake build-essential ffmpeg
macOS:
brew install cmake ffmpeg
- Build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp ~/whisper.cpp
cd ~/whisper.cpp
cmake -B build
cmake --build build --config Release
- Download a model
Start with base.en for English, or base for multilingual:
cd ~/whisper.cpp
./models/download-ggml-model.sh base.en
That should create something like:
~/whisper.cpp/models/ggml-base.en.bin
- Test it manually
~/whisper.cpp/build/bin/whisper-cli \
-m ~/whisper.cpp/models/ggml-base.en.bin \
-f ~/whisper.cpp/samples/jfk.wav
- Wire it into OpenClaw config
Use absolute paths, especially if OpenClaw runs as a daemon/service:
{
tools: {
media: {
audio: {
enabled: true,
models: [
{
type: "cli",
command: "/home/YOU/whisper.cpp/build/bin/whisper-cli",
args: [
"-m", "/home/YOU/whisper.cpp/models/ggml-base.en.bin",
"-otxt", "-of", "{{OutputBase}}",
"-np", "-nt",
"{{MediaPath}}"
],
timeoutSeconds: 60,
},
],
},
},
},
}
Then restart the gateway.
Docs: https://docs.openclaw.ai/nodes/audio
If you tell me your OS + whether OpenClaw is installed from npm/source/Docker, I can tailor the exact paths. For local Qwen, this is the right shape: Whisper transcribes locally, Qwen replies to the transcript.
There are no quotes in that openclaw config
Right — OpenClaw config is JSON5, so object keys often don’t need quotes:
{
tools: {
media: {
audio: {
enabled: true,
models: [
{
type: "cli",
command: "/home/YOU/whisper.cpp/build/bin/whisper-cli",
args: [
"-m", "/home/YOU/whisper.cpp/models/ggml-base.en.bin",
"-otxt", "-of", "{{OutputBase}}",
"-np", "-nt",
"{{MediaPath}}"
],
timeoutSeconds: 60,
},
],
},
},
},
}
The string values still need quotes though — paths, cli, args, {{MediaPath}}, etc. If your existing config style has unquoted keys, that’s totally normal.
Everything in openclaw.json has quotes
Yep, fair catch — if your openclaw.json is written as strict JSON style, quote the keys too. Use this shape:
{
"tools": {
"media": {
"audio": {
"enabled": true,
"models": [
{
"type": "cli",
"command": "/home/YOU/whisper.cpp/build/bin/whisper-cli",
"args": [
"-m",
"/home/YOU/whisper.cpp/models/ggml-base.en.bin",
"-otxt",
"-of",
"{{OutputBase}}",
"-np",
"-nt",
"{{MediaPath}}"
],
"timeoutSeconds": 60
}
]
}
}
}
}
Replace /home/YOU/... with your actual absolute paths. Also: no comments or trailing commas if your parser is treating it as strict JSON.
Thats working. Can I get it to output what it thinks I said?
Yep — enable echoTranscript in the same audio config. OpenClaw will send the transcript back before the agent processes/replies.
{
"tools": {
"media": {
"audio": {
"enabled": true,
"echoTranscript": true,
"echoFormat": "📝 I heard: {transcript}",
"models": [
{
"type": "cli",
"command": "/home/YOU/whisper.cpp/build/bin/whisper-cli",
"args": [
"-m",
"/home/YOU/whisper.cpp/models/ggml-base.en.bin",
"-otxt",
"-of",
"{{OutputBase}}",
"-np",
"-nt",
"{{MediaPath}}"
],
"timeoutSeconds": 60
}
]
}
}
}
}
Then restart the gateway. echoFormat is optional; {transcript} gets replaced with what Whisper heard.
openclaw understood the audio but came back with this message
I got your audio message but the transcription came back empty — could you type out what you said? I want to make sure I don't miss anything.
That sounds like OpenClaw saw the audio attachment, but the transcription result itself was empty. Two quick things to separate:
-
If you did not see your
📝 I heard: ...echo message, then eitherechoTranscriptconfig wasn’t loaded, or the transcript was empty so there was nothing to echo. -
The reply you got is probably your agent reacting to an
[Audio]message with an empty/missing transcript.
First: what does this show?
openclaw --version
Then I’d check these:
openclaw logs --follow
Send another Telegram voice note and look for audio/media transcription lines or whisper errors.
Also test Whisper directly against a real audio file if you can:
/home/YOU/whisper.cpp/build/bin/whisper-cli \
-m /home/YOU/whisper.cpp/models/ggml-base.en.bin \
-f /path/to/audio.ogg
If direct Whisper works but OpenClaw is empty, it’s likely the config args/path. If direct Whisper is also empty, try a better model (small.en instead of base.en) or add language forcing:
"args": [
"-m", "/home/YOU/whisper.cpp/models/ggml-small.en.bin",
"-l", "en",
"-otxt", "-of", "{{OutputBase}}",
"-np", "-nt",
"{{MediaPath}}"
]
Important: after changing openclaw.json, restart the gateway. The echo should appear as a separate message before the agent reply when the transcript is non-empty.
Command failed: /home/YOU/whisper.cpp/build/bin/whisper-cli -m /home/YOU/whisper.cpp/models/ggml-base.en.bin -otxt -of /tmp/openclaw/openclaw-media-cli-XYVVKg/file_9---22b2de4e-7359-41f1-99d0-8ba6678cee68 -np -nt /tmp/openclaw/openclaw-media-cli-XYVVKg/file_9---22b2de4e-7359-41f1-99d0-8ba6678cee68.wav
Ah, there’s the bug: the config still has my placeholder paths in it:
/home/YOU/whisper.cpp/...
That needs to be your real home path / real binary path. For example, if your Linux username is rex, it might be:
"command": "/home/rex/whisper.cpp/build/bin/whisper-cli"
and:
"/home/rex/whisper.cpp/models/ggml-base.en.bin"
Find the real paths with:
find "$HOME/whisper.cpp" -name whisper-cli -type f
find "$HOME/whisper.cpp/models" -name 'ggml-*.bin' -type f
Then update those two values in openclaw.json, restart the gateway, and try the voice note again.
If find returns nothing for whisper-cli, the build may have put it somewhere else; try:
find "$HOME" -name whisper-cli -type f 2>/dev/null | head