#Whisper audio tool
1 messages · Page 1 of 1 (latest)
I checked the current docs/source. The main thing: there are two different paths that sound similar:
-
Skill/tool the agent chooses to use:
openai-whisper/openai-whisper-apiopenai-whisperis explicitly the localwhisperCLI skill.openai-whisper-apiis the OpenAI API/curl skill.- Skills are instructions exposed to the model, so the agent can still decide badly sometimes, especially if both local/API-ish options are visible.
-
Automatic inbound audio transcription:
tools.media.audio- This is what you want for Discord voice/audio attachments to be transcribed before the agent reasons about the message.
- If you leave it on auto, OpenClaw may detect local CLI options like
whisper/whisper-cliand use or suggest them.
If you want provider-only transcription, configure the audio media path explicitly and don’t include a CLI fallback:
{
tools: {
media: {
audio: {
enabled: true,
models: [
{ provider: "openai", model: "gpt-4o-mini-transcribe" }
]
}
}
}
}
Then restart the gateway and check logs with:
openclaw logs --follow
Also, can you confirm the exact output of openclaw --version? You wrote 2025.5.12; if that’s actually 2026.5.12, the docs above match the current audio pipeline. If it really is 2025.x, I’d update first because the media audio config has changed a lot.
Relevant docs:
https://docs.openclaw.ai/nodes/audio
https://docs.openclaw.ai/tools/media-overview
https://docs.openclaw.ai/gateway/configuration-reference#toolsmedia
Yes, its that version This is what I have in the config
"tools": {
"media": {
"audio": {
"enabled": true,
"echoTranscript": true,
"echoFormat": "📝 I heard: {transcript}",
"models": [
{
"type": "cli",
"command": "/home/admin/whisper.cpp/build/bin/whisper-cli",
"args": [
"-m",
"/home/admin/whisper.cpp/models/ggml-base.en.bin",
"-otxt",
"-of",
"{{OutputBase}}",
"-np",
"-nt",
"{{MediaPath}}"
],
"timeoutSeconds": 60
}
]
}
},
That config is for the CLI whisper.cpp path, so it’s expected that OpenClaw will run a command-line Whisper binary. The difference is: OpenClaw should run your exact /home/admin/whisper.cpp/build/bin/whisper-cli during media pre-processing, before the agent starts thinking. The agent should not be deciding to install whisper itself.
A couple of likely causes:
- The snippet is incomplete or not loaded by the running gateway
- Make sure the final file has the closing braces and valid JSON5.
- Restart the gateway after editing.
- Run:
openclaw status --all
openclaw logs --follow
- The configured CLI is failing, so the agent receives an untranscribed audio file and improvises
- If
echoTranscript: trueis working, you should see:
- If
📝 I heard: ...
before/around the agent response.
- If you don’t see that, the media audio pipeline probably isn’t succeeding.
- The service user cannot execute/read those paths
- Check as the same user running the gateway:
/home/admin/whisper.cpp/build/bin/whisper-cli -m /home/admin/whisper.cpp/models/ggml-base.en.bin -otxt -of /tmp/whisper-test -np -nt /path/to/test-audio.wav
- Also check the gateway service can access
/home/admin/...; this often differs from your interactive shell.
- A Whisper skill is also visible to agents
- The bundled
openai-whisperskill teaches agents to use/install the PythonwhisperCLI. That’s separate fromtools.media.audio. - If the transcript echo works but agents still try to install Whisper for other tasks, remove/disable that skill or tighten the agent’s skill/tool exposure.
- The bundled
For your exact config, I’d first verify the echo appears. If it doesn’t, paste the relevant openclaw logs --follow lines around the audio message. That will usually show whether it’s a config-load issue, permission/path issue, timeout, or CLI output parsing issue.
This is in the logs
0:59:09
info
channels/telegram
telegram sendMessage ok chat=6938637819 message=3673
10:59:19
warn
media-understanding
audio: failed (0/1) reason=Command failed
10:59:19
info
gateway/channels/telegram/inbound
Inbound message telegram:6938637819 -> @SQLLeadBot (direct, audio/ogg, 13 chars)
10:59:22
warn
media-understanding
audio: failed (0/1) reason=Command failed
10:59:42
warn
diagnostic
liveness warning: reasons=event_loop_delay interval=34s eventLoopDelayP99Ms=41.8 eventLoopDelayMaxMs=15770.6 eventLoopUtilization=0.715 cpuCoreRatio=0.752 active=1 waiting=0 queued=1 phase=channels.telegram.start-account recentPhases=channels.whatsapp.is-configured:1ms,channels.whatsapp.is-configured:0ms,channels.whatsapp.is-configured:0ms,channels.whatsapp.is-configured:1ms,channels.whatsapp.is-configured:0ms,channels.whatsapp.is-configured:1ms work=[active=agent:lead-scout:telegram:direct:6938637819(processing,q=1,age=23s last=run:completed) queued=agent:lead-scout:telegram:direct:6938637819(processing,q=1,age=23s last=run:completed)]
11:03:14
warn
fetch-timeout
fetch timeout reached; aborting operation
11:03:14
warn
agent/embedded
embedded run agent end
That log confirms the media pipeline is trying your configured entry and the CLI is failing. So this is not “agent forgot the tool” anymore; the transcript step fails, then the agent gets an audio attachment and improvises.
Two things to check first:
- ffmpeg must be installed and visible to the gateway
Telegram sendsaudio/ogg. Forwhisper-cli, OpenClaw transcodes non-wav audio to 16k mono wav first. Ifffmpegis missing from the service PATH, the CLI entry fails before Whisper runs.
which ffmpeg
ffmpeg -version
If OpenClaw runs as a daemon/service, also check the service environment, not just your shell.
- Your whisper.cpp build may require
-fbefore the input file
Try this manually with a real Telegram/downsampled wav or any wav:
/home/admin/whisper.cpp/build/bin/whisper-cli \
-m /home/admin/whisper.cpp/models/ggml-base.en.bin \
-otxt -of /tmp/whisper-test -np -nt \
-f /path/to/test.wav
If that works, change the config args to include -f:
args: [
"-m",
"/home/admin/whisper.cpp/models/ggml-base.en.bin",
"-otxt",
"-of",
"{{OutputBase}}",
"-np",
"-nt",
"-f",
"{{MediaPath}}"
]
Then restart the gateway.
For better error detail, run the gateway/logs in verbose mode if possible; the normal log is currently only surfacing reason=Command failed, which hides whether it was ffmpeg, model path/read permission, bad CLI args, or timeout.
Docs for this path: https://docs.openclaw.ai/nodes/audio
ffmpeg is installed
Good, then the next most likely issue is the whisper-cli argv. In current whisper.cpp, the input file is normally passed with -f; your config passes {{MediaPath}} as a bare positional arg. Some builds accept that, some exit with usage/error.
Change the end of your args from:
"-np",
"-nt",
"{{MediaPath}}"
to:
"-np",
"-nt",
"-f",
"{{MediaPath}}"
Full relevant part:
args: [
"-m",
"/home/admin/whisper.cpp/models/ggml-base.en.bin",
"-otxt",
"-of",
"{{OutputBase}}",
"-np",
"-nt",
"-f",
"{{MediaPath}}"
]
Then restart the gateway.
If it still says Command failed, run the exact binary manually as the gateway user:
/home/admin/whisper.cpp/build/bin/whisper-cli \
-m /home/admin/whisper.cpp/models/ggml-base.en.bin \
-otxt -of /tmp/whisper-test -np -nt -f /path/to/test.wav
The important part is to test as the same user/service context OpenClaw runs under. “ffmpeg is installed” in your shell can still differ from what the daemon sees, but the -f mismatch is the sharper clue here.
I don't get the 'I heard...'
Right, that lines up with the log. echoTranscript only sends after transcription succeeds. Since you have:
media-understanding audio: failed (0/1) reason=Command failed
there is no transcript to echo.
At this point the fix is to get the exact command working. I’d do these in order:
-
Add
-fbefore{{MediaPath}}in config, restart gateway, test again. -
If it still fails, run OpenClaw with verbose logs so we can see the actual command/error instead of only
Command failed:
openclaw logs --follow --verbose
If that flag isn’t supported in your build, start the gateway itself in verbose mode or use:
openclaw status --all
- Manually test whisper.cpp against an ogg-converted wav:
ffmpeg -y -i /path/to/audio.ogg -ac 1 -ar 16000 -c:a pcm_s16le /tmp/test.wav
/home/admin/whisper.cpp/build/bin/whisper-cli \
-m /home/admin/whisper.cpp/models/ggml-base.en.bin \
-otxt -of /tmp/whisper-test -np -nt -f /tmp/test.wav
If that manual command fails, it’s whisper.cpp/model/path/permissions. If it works manually but OpenClaw still fails, it’s almost certainly service environment or permissions for the gateway user.
so what should the json be?
Use this shape in ~/.openclaw/openclaw.json / JSON5, with the -f added before {{MediaPath}}:
{
tools: {
media: {
audio: {
enabled: true,
echoTranscript: true,
echoFormat: "📝 I heard: {transcript}",
models: [
{
type: "cli",
command: "/home/admin/whisper.cpp/build/bin/whisper-cli",
args: [
"-m",
"/home/admin/whisper.cpp/models/ggml-base.en.bin",
"-otxt",
"-of",
"{{OutputBase}}",
"-np",
"-nt",
"-f",
"{{MediaPath}}"
],
timeoutSeconds: 60
}
]
}
}
}
}
If this is inside an existing config file, don’t duplicate the outer { ... }; merge only the tools: { ... } block into your existing top-level object.
Then restart the gateway:
openclaw gateway restart
After sending a voice/audio message, you should see the 📝 I heard: ... echo. If you still don’t, the next thing to paste is the new media-understanding log line after this config change.
You don't have quotes around those, all entries in openclaw.json use quotes
Yep, fair. OpenClaw accepts JSON5, but if your file is written as strict JSON style, use quoted keys:
{
"tools": {
"media": {
"audio": {
"enabled": true,
"echoTranscript": true,
"echoFormat": "📝 I heard: {transcript}",
"models": [
{
"type": "cli",
"command": "/home/admin/whisper.cpp/build/bin/whisper-cli",
"args": [
"-m",
"/home/admin/whisper.cpp/models/ggml-base.en.bin",
"-otxt",
"-of",
"{{OutputBase}}",
"-np",
"-nt",
"-f",
"{{MediaPath}}"
],
"timeoutSeconds": 60
}
]
}
}
}
}
The important change from your original config is just this part:
"-nt",
"-f",
"{{MediaPath}}"
instead of:
"-nt",
"{{MediaPath}}"
I still get this in logs
11:23:08
info
channels/discord
discord client initialized as 1479506703665074350; awaiting gateway readiness
11:23:11
warn
media-understanding
audio: failed (0/1) reason=Command failed
11:23:11
info
gateway/channels/telegram/inbound
Inbound message telegram:6938637819 -> @RSAPiClawBot (direct, audio/ogg, 13 chars)
11:23:11
warn
media-understanding
audio: failed (0/1) reason=Command failed
That means the config is being picked up, but the configured command is still exiting non-zero. The normal log is hiding the actual stderr, so now we need to isolate the failing command outside OpenClaw.
Run these on the same machine, ideally as the same user that runs the gateway:
ls -l /home/admin/whisper.cpp/build/bin/whisper-cli
ls -l /home/admin/whisper.cpp/models/ggml-base.en.bin
/home/admin/whisper.cpp/build/bin/whisper-cli --help | head
Then test the exact audio flow OpenClaw uses:
ffmpeg -y -i /path/to/telegram-audio.ogg \
-ac 1 -ar 16000 -c:a pcm_s16le /tmp/openclaw-whisper-test.wav
/home/admin/whisper.cpp/build/bin/whisper-cli \
-m /home/admin/whisper.cpp/models/ggml-base.en.bin \
-otxt -of /tmp/openclaw-whisper-test -np -nt -f /tmp/openclaw-whisper-test.wav
If that fails, paste the terminal error from that command. That will be the real issue.
If that works manually, then it’s almost certainly the gateway service environment. Common cases:
- gateway runs as a different user and cannot read
/home/admin/whisper.cpp/... ffmpegis installed in your shell PATH but not in the daemon PATHwhisper-clineeds a shared library path that your shell has but the service does not
Also worth checking:
openclaw gateway status
openclaw status --all
Those should show which config/service path is actually in use.
Results are here https://cl1p.net/4560
That paste proves the binary and model exist, and it also confirms your build supports -f. It doesn’t show the actual transcription command against an audio file yet, which is the part that’s failing.
Please run this with a real Telegram audio file path:
ffmpeg -y -i /path/to/telegram-audio.ogg \
-ac 1 -ar 16000 -c:a pcm_s16le /tmp/openclaw-whisper-test.wav
/home/admin/whisper.cpp/build/bin/whisper-cli \
-m /home/admin/whisper.cpp/models/ggml-base.en.bin \
-otxt -of /tmp/openclaw-whisper-test -np -nt -f /tmp/openclaw-whisper-test.wav
cat /tmp/openclaw-whisper-test.txt
If that manual command succeeds, the next best move is a tiny wrapper so we can see what OpenClaw is actually invoking and what stderr says. Example:
cat > /home/admin/whisper-openclaw-debug.sh <<'SH'
#!/usr/bin/env bash
{
echo "=== $(date) ==="
echo "USER=$(id)"
echo "PWD=$PWD"
echo "PATH=$PATH"
printf 'ARGS:'; printf ' <%s>' "$@"; echo
} >> /tmp/openclaw-whisper-debug.log
/home/admin/whisper.cpp/build/bin/whisper-cli "$@" >> /tmp/openclaw-whisper-debug.log 2>&1
status=$?
echo "EXIT=$status" >> /tmp/openclaw-whisper-debug.log
exit $status
SH
chmod +x /home/admin/whisper-openclaw-debug.sh
Then temporarily set:
"command": "/home/admin/whisper-openclaw-debug.sh"
Keep the same args, restart the gateway, send one audio message, then check:
cat /tmp/openclaw-whisper-debug.log
That should reveal whether OpenClaw is passing a wav path, whether the service user can read it, and the real whisper.cpp error.
Isn't there an example file that comes with whisper I can use?
Yes. whisper.cpp usually includes sample audio under the repo, commonly samples/jfk.wav.
Try:
find /home/admin/whisper.cpp -iname '*.wav' -o -iname '*.ogg' -o -iname '*.mp3' | head -20
If you see /home/admin/whisper.cpp/samples/jfk.wav, test with:
/home/admin/whisper.cpp/build/bin/whisper-cli \
-m /home/admin/whisper.cpp/models/ggml-base.en.bin \
-otxt -of /tmp/openclaw-whisper-test -np -nt \
-f /home/admin/whisper.cpp/samples/jfk.wav
cat /tmp/openclaw-whisper-test.txt
That confirms whisper.cpp itself works. After that, still test one audio/ogg through ffmpeg, because OpenClaw is receiving Telegram audio/ogg, not wav.
this is what I get
admin@PiClaw:~/whisper.cpp $ /home/admin/whisper.cpp/build/bin/whisper-cli
-m /home/admin/whisper.cpp/models/ggml-base.en.bin
-otxt -of /tmp/openclaw-whisper-test -np -nt
-f /home/admin/whisper.cpp/samples/jfk.wav
And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.output_txt: saving output to '/tmp/openclaw-whisper-test.txt'
Good, whisper.cpp itself works. So the remaining failure is almost certainly in the OpenClaw runtime path: either the Telegram ogg -> wav conversion, the service user/environment, or the temp file OpenClaw passes to whisper.
Next test the same format OpenClaw receives:
ffmpeg -y -i /path/to/a/telegram-audio.ogg \
-ac 1 -ar 16000 -c:a pcm_s16le /tmp/openclaw-whisper-test.wav
/home/admin/whisper.cpp/build/bin/whisper-cli \
-m /home/admin/whisper.cpp/models/ggml-base.en.bin \
-otxt -of /tmp/openclaw-whisper-test -np -nt \
-f /tmp/openclaw-whisper-test.wav
cat /tmp/openclaw-whisper-test.txt
If that works too, use the debug wrapper I posted above as the command temporarily. That will capture the exact OpenClaw args + stderr in /tmp/openclaw-whisper-debug.log.
One extra detail from your output: whisper.cpp prints this to stdout:
...country.output_txt: saving output...
OpenClaw should still prefer the {{OutputBase}}.txt file, so that’s fine, but if the file isn’t created for OpenClaw’s temp path then it falls back to stdout and may include that suffix. First we need to get the command to exit 0 inside OpenClaw.
how do I find the path too .ogg file
You don’t actually need the Telegram file for the next test. Make an .ogg from the working sample:
ffmpeg -y -i /home/admin/whisper.cpp/samples/jfk.wav /tmp/jfk.ogg
ffmpeg -y -i /tmp/jfk.ogg \
-ac 1 -ar 16000 -c:a pcm_s16le /tmp/openclaw-whisper-test.wav
/home/admin/whisper.cpp/build/bin/whisper-cli \
-m /home/admin/whisper.cpp/models/ggml-base.en.bin \
-otxt -of /tmp/openclaw-whisper-test -np -nt \
-f /tmp/openclaw-whisper-test.wav
cat /tmp/openclaw-whisper-test.txt
That tests the same OGG -> WAV -> whisper path OpenClaw uses.
For the actual Telegram attachment: OpenClaw downloads it to a temp file like /tmp/openclaw... / openclaw-media..., but it usually deletes it after the run, so it’s hard to grab directly. The debug wrapper is the better way to see the exact temporary path and stderr. If you want to try catching it live:
find /tmp -name 'openclaw-media*' -o -name '*.ogg' -o -name '*.oga'
but don’t be surprised if nothing is there after the failure finishes.
You can find results here https://cl1p.net/4561
Great, that proves all three pieces work manually:
- ffmpeg can make OGG
- ffmpeg can convert OGG -> WAV
- whisper-cli can transcribe the converted WAV
So the failure is specific to the gateway runtime context. The fastest way to get the real error is the wrapper. Use this exactly:
cat > /home/admin/whisper-openclaw-debug.sh <<'SH'
#!/usr/bin/env bash
{
echo "=== $(date) ==="
echo "USER=$(id)"
echo "PWD=$PWD"
echo "PATH=$PATH"
printf 'ARGS:'; printf ' <%s>' "$@"; echo
} >> /tmp/openclaw-whisper-debug.log
/home/admin/whisper.cpp/build/bin/whisper-cli "$@" >> /tmp/openclaw-whisper-debug.log 2>&1
status=$?
echo "EXIT=$status" >> /tmp/openclaw-whisper-debug.log
exit $status
SH
chmod +x /home/admin/whisper-openclaw-debug.sh
Then change only the command in config:
"command": "/home/admin/whisper-openclaw-debug.sh"
Leave the args as:
"args": [
"-m",
"/home/admin/whisper.cpp/models/ggml-base.en.bin",
"-otxt",
"-of",
"{{OutputBase}}",
"-np",
"-nt",
"-f",
"{{MediaPath}}"
]
Restart gateway, send one audio message, then run:
cat /tmp/openclaw-whisper-debug.log
That log should tell us the exact temp path OpenClaw passed and the actual whisper.cpp error.
=== Fri 15 May 11:40:45 BST 2026 ===
USER=uid=1000(admin) gid=1000(admin) groups=1000(admin),4(adm),20(dialout),24(cdrom),27(sudo),29(audio),44(video),46(plugdev),60(games),100(users),102(netdev),984(ollama),986(gpio),988(i2c),989(spi),992(render),996(input)
PWD=/home/admin
PATH=/usr/bin:/bin:/home/admin/.local/bin:/home/admin/.npm-global/bin:/home/admin/bin:/home/admin/.bun/bin:/home/admin/.nix-profile/bin:/usr/local/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin
ARGS: <-m> </home/admin/whisper.cpp/models/ggml-base.en.bin> <-otxt> <-of> </tmp/openclaw/openclaw-media-cli-Ofhsv2/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc> <-np> <-nt> <-f> </home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg>
error: failed to read audio data as wav (Unknown error)
error: failed to read audio file '/home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg'
EXIT=0
=== Fri 15 May 11:40:45 BST 2026 ===
USER=uid=1000(admin) gid=1000(admin) groups=1000(admin),4(adm),20(dialout),24(cdrom),27(sudo),29(audio),44(video),46(plugdev),60(games),100(users),102(netdev),984(ollama),986(gpio),988(i2c),989(spi),992(render),996(input)
PWD=/home/admin
PATH=/usr/bin:/bin:/home/admin/.local/bin:/home/admin/.npm-global/bin:/home/admin/bin:/home/admin/.bun/bin:/home/admin/.nix-profile/bin:/usr/local/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin
ARGS: <-m> </home/admin/whisper.cpp/models/ggml-base.en.bin> <-otxt> <-of> </tmp/openclaw/openclaw-media-cli-TfOCn2/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc> <-np> <-nt> <-f> </home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg>
error: failed to read audio data as wav (Unknown error)
error: failed to read audio file '/home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg'
EXIT=0
Aha, the wrapper changed the behavior.
OpenClaw only does the automatic OGG -> WAV transcode when the configured command basename is exactly whisper-cli. With:
"command": "/home/admin/whisper-openclaw-debug.sh"
the basename is whisper-openclaw-debug.sh, so OpenClaw passed the original .ogg straight through. That’s why your log shows:
-f /home/admin/.openclaw/media/inbound/...ogg
error: failed to read audio data as wav
Make the debug wrapper’s filename be whisper-cli so OpenClaw still recognizes it:
mkdir -p /home/admin/openclaw-whisper-debug
cat > /home/admin/openclaw-whisper-debug/whisper-cli <<'SH'
#!/usr/bin/env bash
{
echo "=== $(date) ==="
echo "USER=$(id)"
echo "PWD=$PWD"
echo "PATH=$PATH"
printf 'ARGS:'; printf ' <%s>' "$@"; echo
} >> /tmp/openclaw-whisper-debug.log
/home/admin/whisper.cpp/build/bin/whisper-cli "$@" >> /tmp/openclaw-whisper-debug.log 2>&1
status=$?
echo "EXIT=$status" >> /tmp/openclaw-whisper-debug.log
exit $status
SH
chmod +x /home/admin/openclaw-whisper-debug/whisper-cli
Then set:
"command": "/home/admin/openclaw-whisper-debug/whisper-cli"
Restart gateway and test again.
Also, now that we have the real inbound file path, you can test the Telegram file directly through ffmpeg:
ffmpeg -y -i /home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg \
-ac 1 -ar 16000 -c:a pcm_s16le /tmp/openclaw-telegram-test.wav
If that ffmpeg command fails, the saved inbound file is the issue. If it succeeds, the renamed wrapper should show OpenClaw passing a .wav temp path instead of .ogg.
Results here https://cl1p.net/4560
That result is useful: ffmpeg can read the actual Telegram .ogg and convert it to wav successfully. So the inbound file itself is fine.
The missing piece now is the renamed wrapper test, where the command path basename is still whisper-cli:
"command": "/home/admin/openclaw-whisper-debug/whisper-cli"
After restarting the gateway and sending one audio message, please paste:
cat /tmp/openclaw-whisper-debug.log
What I’m looking for is whether the ARGS line ends in a temp .wav path like:
-f /tmp/openclaw/...something.wav
or still ends in the original .ogg.
If it’s still .ogg, OpenClaw isn’t recognizing the command as whisper-cli.
If it’s .wav, then we’ll see the real whisper-cli stderr for the converted file.
admin@PiClaw:~/.openclaw $ cat /tmp/openclaw-whisper-debug.log
=== Fri 15 May 11:40:45 BST 2026 ===
USER=uid=1000(admin) gid=1000(admin) groups=1000(admin),4(adm),20(dialout),24(cdrom),27(sudo),29(audio),44(video),46(plugdev),60(games),100(users),102(netdev),984(ollama),986(gpio),988(i2c),989(spi),992(render),996(input)
PWD=/home/admin
PATH=/usr/bin:/bin:/home/admin/.local/bin:/home/admin/.npm-global/bin:/home/admin/bin:/home/admin/.bun/bin:/home/admin/.nix-profile/bin:/usr/local/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin
ARGS: <-m> </home/admin/whisper.cpp/models/ggml-base.en.bin> <-otxt> <-of> </tmp/openclaw/openclaw-media-cli-Ofhsv2/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc> <-np> <-nt> <-f> </home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg>
error: failed to read audio data as wav (Unknown error)
error: failed to read audio file '/home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg'
EXIT=0
=== Fri 15 May 11:40:45 BST 2026 ===
USER=uid=1000(admin) gid=1000(admin) groups=1000(admin),4(adm),20(dialout),24(cdrom),27(sudo),29(audio),44(video),46(plugdev),60(games),100(users),102(netdev),984(ollama),986(gpio),988(i2c),989(spi),992(render),996(input)
PWD=/home/admin
PATH=/usr/bin:/bin:/home/admin/.local/bin:/home/admin/.npm-global/bin:/home/admin/bin:/home/admin/.bun/bin:/home/admin/.nix-profile/bin:/usr/local/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin
ARGS: <-m> </home/admin/whisper.cpp/models/ggml-base.en.bin> <-otxt> <-of> </tmp/openclaw/openclaw-media-cli-TfOCn2/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc> <-np> <-nt> <-f> </home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg>
error: failed to read audio data as wav (Unknown error)
error: failed to read audio file '/home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg'
EXIT=0
That log is still the old 11:40 run, and it still shows the previous wrapper behavior. Let’s reset the log so we know the next run is fresh.
Please do this:
rm /tmp/openclaw-whisper-debug.log
Then make sure your config command is exactly:
"command": "/home/admin/openclaw-whisper-debug/whisper-cli"
Not:
"command": "/home/admin/whisper-openclaw-debug.sh"
Then restart:
openclaw gateway restart
Send one new Telegram audio message, then:
cat /tmp/openclaw-whisper-debug.log
If the wrapper path is correct, the new ARGS should end in a generated .wav path under /tmp/openclaw/..., not the original .ogg under ~/.openclaw/media/inbound/....
I'm not seeing debug log changing. I am seeing this in openclaw log
11:55:03
info
channels/discord
discord channels resolved: 1479508087080751206/1479508087080751209 (guild:OpenClaw; channel:general), 1479508087080751206/1479874320934113504 (guild:OpenClaw; channel:vega-lite)
11:55:04
info
gateway/ws
⇄ res ✓ logs.tail 955ms conn=b68af1bd…9a1d id=4bcc7300…4eb6
11:55:04
info
channels/discord
discord client initialized as 1479506703665074350; awaiting gateway readiness
11:55:05
warn
media-understanding
audio: failed (0/1) reason=Command failed
11:55:05
info
gateway/channels/telegram/inbound
Inbound message telegram:6938637819 -> @RSAPiClawBot (direct, audio/ogg, 13 chars)
11:55:05
warn
media-understanding
audio: failed (0/1) reason=Command failed
If the debug log is not changing, OpenClaw is not running that wrapper. So the current failure is coming from a different command/config than the one you edited, or the gateway didn’t restart with that config.
Check these:
openclaw gateway status
openclaw status --all
Look for the config path the gateway is using. Then search the config files for every whisper entry:
grep -R "whisper-cli\|whisper-openclaw\|openclaw-whisper-debug" ~/.openclaw -n
You want the active config to contain:
"command": "/home/admin/openclaw-whisper-debug/whisper-cli"
Also verify the wrapper exists and is executable:
ls -l /home/admin/openclaw-whisper-debug/whisper-cli
/home/admin/openclaw-whisper-debug/whisper-cli --help | head
If openclaw gateway status shows a different config path than the file you edited, move the change there. If the config path is right but the wrapper still isn’t called after restart, paste the tools.media.audio block from the active config and the config path shown by gateway status.
One small clue: your log still fails immediately at the same point, but no wrapper write means we’re not debugging the command OpenClaw is actually executing yet.
Results here https://cl1p.net/4561
That grep is actually helpful: the active ~/.openclaw/openclaw.json does point at:
/home/admin/openclaw-whisper-debug/whisper-cli
So if the debug log is not changing, the next likely issue is that this wrapper path either does not exist, is not executable, or is failing before it can write the log.
Run:
ls -l /home/admin/openclaw-whisper-debug/whisper-cli
head -40 /home/admin/openclaw-whisper-debug/whisper-cli
/home/admin/openclaw-whisper-debug/whisper-cli --help | head
Also reset the debug file and confirm it gets created by a manual wrapper run:
rm -f /tmp/openclaw-whisper-debug.log
/home/admin/openclaw-whisper-debug/whisper-cli --help >/tmp/wrapper-help.txt 2>&1
cat /tmp/openclaw-whisper-debug.log
If that manual wrapper run does not create /tmp/openclaw-whisper-debug.log, the wrapper contents/path are wrong.
If it does create the log manually, then paste:
openclaw gateway status
openclaw status --all
because then the gateway is probably running from a different config path or not actually restarted into the config we’re looking at.
-rwxrwxr-x 1 admin admin 356 May 15 11:43 /home/admin/openclaw-whisper-debug/whisper-cli
#!/usr/bin/env bash
{
echo "=== $(date) ==="
echo "USER=$(id)"
echo "PWD=$PWD"
echo "PATH=$PATH"
printf 'ARGS:'; printf ' <%s>' "$@"; echo
} >> /tmp/openclaw-whisper-debug.log
/home/admin/whisper.cpp/build/bin/whisper-cli "$@" >> /tmp/openclaw-whisper-debug.log 2>&1
status=$?
echo "EXIT=$status" >> /tmp/openclaw-whisper-debug.log
exit $status
The wrapper file looks fine.
One important catch: OpenClaw’s special OGG -> WAV transcode check compares the command basename to exactly whisper-cli. In your installed build, it may be using the raw path basename correctly, but to avoid any path edge case, put the wrapper directory at the front of the gateway PATH and set the command to just whisper-cli.
Config:
"command": "whisper-cli"
Keep the same args.
Then start the gateway with this PATH, or add it to the service env:
export PATH="/home/admin/openclaw-whisper-debug:$PATH"
openclaw gateway restart
If you’re using a system service and export PATH doesn’t persist, use whatever service/env mechanism your install uses, or as a quick test stop the daemon and run the gateway manually from that shell with the PATH set.
Before testing, reset the log:
rm -f /tmp/openclaw-whisper-debug.log
Then send audio and check:
cat /tmp/openclaw-whisper-debug.log
Also please run the manual wrapper test I mentioned:
rm -f /tmp/openclaw-whisper-debug.log
/home/admin/openclaw-whisper-debug/whisper-cli --help >/tmp/wrapper-help.txt 2>&1
cat /tmp/openclaw-whisper-debug.log
That confirms the wrapper writes correctly before we chase the gateway side.
cat: /tmp/openclaw-whisper-debug.log: No such file or directory
That means the wrapper did not run at all.
Let’s separate the two cases.
First, test the wrapper directly:
rm -f /tmp/openclaw-whisper-debug.log
bash -x /home/admin/openclaw-whisper-debug/whisper-cli --help >/tmp/wrapper-help.txt 2>/tmp/wrapper-bash-x.txt
cat /tmp/openclaw-whisper-debug.log
cat /tmp/wrapper-bash-x.txt
If that still says the debug log does not exist, paste /tmp/wrapper-bash-x.txt.
If the direct test does create the log, but sending Telegram audio does not, then OpenClaw is not executing that wrapper. In that case run:
openclaw gateway status
openclaw status --all
grep -n -A25 -B5 '"audio"' ~/.openclaw/openclaw.json
Also, if you changed config to:
"command": "whisper-cli"
then the gateway must have /home/admin/openclaw-whisper-debug at the front of its PATH. If it doesn’t, it will run the real whisper-cli instead of the wrapper. For now, the simplest debug config is still the absolute wrapper path:
"command": "/home/admin/openclaw-whisper-debug/whisper-cli"
Once we prove the wrapper is actually being called, we can switch back to the real binary.
Results here https://cl1p.net/4560
Found it. Your active config now has:
"command": "whisper-cli"
but the gateway service PATH does not include:
/home/admin/openclaw-whisper-debug
/home/admin/whisper.cpp/build/bin
It does include:
/home/admin/bin
So OpenClaw is probably trying to run whisper-cli from PATH and failing before your wrapper ever runs.
For debugging, put the wrapper where the service already looks:
mkdir -p /home/admin/bin
ln -sf /home/admin/openclaw-whisper-debug/whisper-cli /home/admin/bin/whisper-cli
rm -f /tmp/openclaw-whisper-debug.log
openclaw gateway restart
Keep config as:
"command": "whisper-cli"
Then send one audio message and check:
cat /tmp/openclaw-whisper-debug.log
If that works, the final non-debug setup can be either:
"command": "/home/admin/whisper.cpp/build/bin/whisper-cli"
or keep "command": "whisper-cli" and symlink the real binary into /home/admin/bin/whisper-cli after removing the debug symlink.
The key issue is PATH: your configured command became just whisper-cli, but the gateway service could not see the directory containing the binary/wrapper.
cat: /tmp/openclaw-whisper-debug.log: No such file or directory
No debug log means OpenClaw is failing before it ever reaches whisper-cli. Since your config command is now "whisper-cli", OpenClaw tries its internal OGG -> WAV ffmpeg step first; if that step fails, the whisper wrapper never runs.
At this point I’d stop fighting the special-case path and use a small wrapper that handles OGG itself. This bypasses OpenClaw’s internal whisper-cli transcode detection and prints the transcript to stdout, which OpenClaw can consume.
Create this:
cat > /home/admin/openclaw-whisper-transcribe.sh <<'SH'
#!/usr/bin/env bash
set -euo pipefail
model=""
input=""
while [[ $# -gt 0 ]]; do
case "$1" in
-m|--model)
model="$2"
shift 2
;;
*)
input="$1"
shift
;;
esac
done
if [[ -z "$model" || -z "$input" ]]; then
echo "usage: openclaw-whisper-transcribe.sh -m MODEL AUDIO_FILE" >&2
exit 2
fi
workdir="$(mktemp -d /tmp/openclaw-whisper.XXXXXX)"
trap 'rm -rf "$workdir"' EXIT
wav="$workdir/input.wav"
out="$workdir/transcript"
ffmpeg -y -i "$input" -ac 1 -ar 16000 -c:a pcm_s16le "$wav" >/dev/null 2>&1
/home/admin/whisper.cpp/build/bin/whisper-cli \
-m "$model" \
-otxt -of "$out" -np -nt -f "$wav" >/dev/null 2>&1
cat "$out.txt"
SH
chmod +x /home/admin/openclaw-whisper-transcribe.sh
Then change the audio model config to:
{
"type": "cli",
"command": "/home/admin/openclaw-whisper-transcribe.sh",
"args": [
"-m",
"/home/admin/whisper.cpp/models/ggml-base.en.bin",
"{{MediaPath}}"
],
"timeoutSeconds": 60
}
Keep your surrounding enabled, echoTranscript, and echoFormat settings as-is. Restart gateway after editing.
Why this should work: it passes the original Telegram .ogg into your script, the script runs ffmpeg itself, then calls your known-working whisper.cpp command, then prints only the transcript.
ah, that actually worked
Nice, glad we got it pinned down.
What happened: OpenClaw’s built-in whisper-cli special path was failing before it reached your binary, so wrapping the whole OGG -> WAV -> whisper flow made it explicit and reliable.
I’d leave that wrapper in place for now. If you want to clean up later, the final config should stay as the wrapper command:
{
"type": "cli",
"command": "/home/admin/openclaw-whisper-transcribe.sh",
"args": [
"-m",
"/home/admin/whisper.cpp/models/ggml-base.en.bin",
"{{MediaPath}}"
],
"timeoutSeconds": 60
}
Since the built-in path failed silently with only Command failed, this is probably worth filing as a GitHub issue with the thread summary/logs: https://github.com/openclaw/openclaw/issues
And if this fixed it, please right-click the solution message → Apps → ✅ Mark Solution. That helps Answer Overflow index the fix for the next person.
Yes, please tell me how to clean up
Yep. Clean it up like this:
- Keep the working wrapper:
ls -l /home/admin/openclaw-whisper-transcribe.sh
- Remove the debug wrapper/symlink if you made them:
rm -f /home/admin/bin/whisper-cli
rm -rf /home/admin/openclaw-whisper-debug
rm -f /tmp/openclaw-whisper-debug.log /tmp/wrapper-help.txt /tmp/wrapper-bash-x.txt
If you prefer recoverable cleanup and have trash installed, use trash instead of rm -rf for the debug directory.
- Make sure your final
tools.media.audio.modelsentry is only this:
{
"type": "cli",
"command": "/home/admin/openclaw-whisper-transcribe.sh",
"args": [
"-m",
"/home/admin/whisper.cpp/models/ggml-base.en.bin",
"{{MediaPath}}"
],
"timeoutSeconds": 60
}
So the full audio block should be:
"audio": {
"enabled": true,
"echoTranscript": true,
"echoFormat": "📝 I heard: {transcript}",
"models": [
{
"type": "cli",
"command": "/home/admin/openclaw-whisper-transcribe.sh",
"args": [
"-m",
"/home/admin/whisper.cpp/models/ggml-base.en.bin",
"{{MediaPath}}"
],
"timeoutSeconds": 60
}
]
}
- Restart and verify one more voice note:
openclaw gateway restart
openclaw logs --follow
If you get the 📝 I heard: ... echo and no media-understanding audio: failed, you’re done.
how about whisper-openclaw-debug.sh
Yes, remove that too. It was only for debugging.
rm -f /home/admin/whisper-openclaw-debug.sh
Or with trash, if installed:
trash /home/admin/whisper-openclaw-debug.sh
Keep only the working production wrapper:
/home/admin/openclaw-whisper-transcribe.sh