#Whisper audio tool

1 messages · Page 1 of 1 (latest)

tribal plank
#

I'm using version 2025.5.12.I've installed whisper audio tool and placed config in openclaw.json, but sometimes agents don't use it and try to install the command line whisper library, what am I missing?

rare cedarBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

gloomy waspBOT
# tribal plank I'm using version 2025.5.12.I've installed whisper audio tool and placed config ...

I checked the current docs/source. The main thing: there are two different paths that sound similar:

  1. Skill/tool the agent chooses to use: openai-whisper / openai-whisper-api

    • openai-whisper is explicitly the local whisper CLI skill.
    • openai-whisper-api is the OpenAI API/curl skill.
    • Skills are instructions exposed to the model, so the agent can still decide badly sometimes, especially if both local/API-ish options are visible.
  2. Automatic inbound audio transcription: tools.media.audio

    • This is what you want for Discord voice/audio attachments to be transcribed before the agent reasons about the message.
    • If you leave it on auto, OpenClaw may detect local CLI options like whisper / whisper-cli and use or suggest them.

If you want provider-only transcription, configure the audio media path explicitly and don’t include a CLI fallback:

{
  tools: {
gloomy waspBOT
gloomy waspBOT
# tribal plank I'm using version 2025.5.12.I've installed whisper audio tool and placed config ...
openclaw logs --follow

Also, can you confirm the exact output of openclaw --version? You wrote 2025.5.12; if that’s actually 2026.5.12, the docs above match the current audio pipeline. If it really is 2025.x, I’d update first because the media audio config has changed a lot.

Relevant docs:
https://docs.openclaw.ai/nodes/audio
https://docs.openclaw.ai/tools/media-overview
https://docs.openclaw.ai/gateway/configuration-reference#toolsmedia

tribal plank
#

Yes, its that version This is what I have in the config

"tools": {
"media": {
"audio": {
"enabled": true,
"echoTranscript": true,
"echoFormat": "📝 I heard: {transcript}",
"models": [
{
"type": "cli",
"command": "/home/admin/whisper.cpp/build/bin/whisper-cli",
"args": [
"-m",
"/home/admin/whisper.cpp/models/ggml-base.en.bin",
"-otxt",
"-of",
"{{OutputBase}}",
"-np",
"-nt",
"{{MediaPath}}"
],
"timeoutSeconds": 60
}
]
}
},

gloomy waspBOT
# tribal plank Yes, its that version This is what I have in the config "tools": { "media...

That config is for the CLI whisper.cpp path, so it’s expected that OpenClaw will run a command-line Whisper binary. The difference is: OpenClaw should run your exact /home/admin/whisper.cpp/build/bin/whisper-cli during media pre-processing, before the agent starts thinking. The agent should not be deciding to install whisper itself.

A couple of likely causes:

  1. The snippet is incomplete or not loaded by the running gateway
    • Make sure the final file has the closing braces and valid JSON5.
    • Restart the gateway after editing.
    • Run:
openclaw status --all
openclaw logs --follow
  1. The configured CLI is failing, so the agent receives an untranscribed audio file and improvises
    • If echoTranscript: true is working, you should see:
gloomy waspBOT
# tribal plank Yes, its that version This is what I have in the config "tools": { "media...
📝 I heard: ...

before/around the agent response.

  • If you don’t see that, the media audio pipeline probably isn’t succeeding.
  1. The service user cannot execute/read those paths
    • Check as the same user running the gateway:
/home/admin/whisper.cpp/build/bin/whisper-cli -m /home/admin/whisper.cpp/models/ggml-base.en.bin -otxt -of /tmp/whisper-test -np -nt /path/to/test-audio.wav
  • Also check the gateway service can access /home/admin/...; this often differs from your interactive shell.
  1. A Whisper skill is also visible to agents
    • The bundled openai-whisper skill teaches agents to use/install the Python whisper CLI. That’s separate from tools.media.audio.
    • If the transcript echo works but agents still try to install Whisper for other tasks, remove/disable that skill or tighten the agent’s skill/tool exposure.
gloomy waspBOT
tribal plank
#

This is in the logs

0:59:09
info
channels/telegram
telegram sendMessage ok chat=6938637819 message=3673
10:59:19
warn
media-understanding
audio: failed (0/1) reason=Command failed
10:59:19
info
gateway/channels/telegram/inbound
Inbound message telegram:6938637819 -> @SQLLeadBot (direct, audio/ogg, 13 chars)
10:59:22
warn
media-understanding
audio: failed (0/1) reason=Command failed
10:59:42
warn
diagnostic
liveness warning: reasons=event_loop_delay interval=34s eventLoopDelayP99Ms=41.8 eventLoopDelayMaxMs=15770.6 eventLoopUtilization=0.715 cpuCoreRatio=0.752 active=1 waiting=0 queued=1 phase=channels.telegram.start-account recentPhases=channels.whatsapp.is-configured:1ms,channels.whatsapp.is-configured:0ms,channels.whatsapp.is-configured:0ms,channels.whatsapp.is-configured:1ms,channels.whatsapp.is-configured:0ms,channels.whatsapp.is-configured:1ms work=[active=agent:lead-scout:telegram:direct:6938637819(processing,q=1,age=23s last=run:completed) queued=agent:lead-scout:telegram:direct:6938637819(processing,q=1,age=23s last=run:completed)]
11:03:14
warn
fetch-timeout
fetch timeout reached; aborting operation
11:03:14
warn
agent/embedded
embedded run agent end

gloomy waspBOT
# tribal plank This is in the logs 0:59:09 info channels/telegram telegram sendMessage ok chat...

That log confirms the media pipeline is trying your configured entry and the CLI is failing. So this is not “agent forgot the tool” anymore; the transcript step fails, then the agent gets an audio attachment and improvises.

Two things to check first:

  1. ffmpeg must be installed and visible to the gateway
    Telegram sends audio/ogg. For whisper-cli, OpenClaw transcodes non-wav audio to 16k mono wav first. If ffmpeg is missing from the service PATH, the CLI entry fails before Whisper runs.
which ffmpeg
ffmpeg -version

If OpenClaw runs as a daemon/service, also check the service environment, not just your shell.

  1. Your whisper.cpp build may require -f before the input file
    Try this manually with a real Telegram/downsampled wav or any wav:
gloomy waspBOT
gloomy waspBOT
tribal plank
#

ffmpeg is installed

gloomy waspBOT
# tribal plank ffmpeg is installed

Good, then the next most likely issue is the whisper-cli argv. In current whisper.cpp, the input file is normally passed with -f; your config passes {{MediaPath}} as a bare positional arg. Some builds accept that, some exit with usage/error.

Change the end of your args from:

"-np",
"-nt",
"{{MediaPath}}"

to:

"-np",
"-nt",
"-f",
gloomy waspBOT
# tribal plank ffmpeg is installed
"{{MediaPath}}"

Full relevant part:

args: [
  "-m",
  "/home/admin/whisper.cpp/models/ggml-base.en.bin",
  "-otxt",
  "-of",
  "{{OutputBase}}",
  "-np",
  "-nt",
  "-f",
gloomy waspBOT
# tribal plank ffmpeg is installed
  "{{MediaPath}}"
]

Then restart the gateway.

If it still says Command failed, run the exact binary manually as the gateway user:

/home/admin/whisper.cpp/build/bin/whisper-cli \
  -m /home/admin/whisper.cpp/models/ggml-base.en.bin \
  -otxt -of /tmp/whisper-test -np -nt -f /path/to/test.wav

The important part is to test as the same user/service context OpenClaw runs under. “ffmpeg is installed” in your shell can still differ from what the daemon sees, but the -f mismatch is the sharper clue here.

tribal plank
#

I don't get the 'I heard...'

gloomy waspBOT
# tribal plank I don't get the 'I heard...'

Right, that lines up with the log. echoTranscript only sends after transcription succeeds. Since you have:

media-understanding audio: failed (0/1) reason=Command failed

there is no transcript to echo.

At this point the fix is to get the exact command working. I’d do these in order:

  1. Add -f before {{MediaPath}} in config, restart gateway, test again.

  2. If it still fails, run OpenClaw with verbose logs so we can see the actual command/error instead of only Command failed:

openclaw logs --follow --verbose
gloomy waspBOT
# tribal plank I don't get the 'I heard...'

If that flag isn’t supported in your build, start the gateway itself in verbose mode or use:

openclaw status --all
  1. Manually test whisper.cpp against an ogg-converted wav:
ffmpeg -y -i /path/to/audio.ogg -ac 1 -ar 16000 -c:a pcm_s16le /tmp/test.wav

/home/admin/whisper.cpp/build/bin/whisper-cli \
  -m /home/admin/whisper.cpp/models/ggml-base.en.bin \
  -otxt -of /tmp/whisper-test -np -nt -f /tmp/test.wav

If that manual command fails, it’s whisper.cpp/model/path/permissions. If it works manually but OpenClaw still fails, it’s almost certainly service environment or permissions for the gateway user.

tribal plank
#

so what should the json be?

gloomy waspBOT
# tribal plank so what should the json be?

Use this shape in ~/.openclaw/openclaw.json / JSON5, with the -f added before {{MediaPath}}:

{
  tools: {
    media: {
      audio: {
        enabled: true,
        echoTranscript: true,
        echoFormat: "📝 I heard: {transcript}",
        models: [
          {
            type: "cli",
            command: "/home/admin/whisper.cpp/build/bin/whisper-cli",
            args: [
              "-m",
gloomy waspBOT
gloomy waspBOT
# tribal plank so what should the json be?
}

If this is inside an existing config file, don’t duplicate the outer { ... }; merge only the tools: { ... } block into your existing top-level object.

Then restart the gateway:

openclaw gateway restart

After sending a voice/audio message, you should see the 📝 I heard: ... echo. If you still don’t, the next thing to paste is the new media-understanding log line after this config change.

tribal plank
#

You don't have quotes around those, all entries in openclaw.json use quotes

gloomy waspBOT
gloomy waspBOT
gloomy waspBOT
tribal plank
#

I still get this in logs
11:23:08
info
channels/discord
discord client initialized as 1479506703665074350; awaiting gateway readiness
11:23:11
warn
media-understanding
audio: failed (0/1) reason=Command failed
11:23:11
info
gateway/channels/telegram/inbound
Inbound message telegram:6938637819 -> @RSAPiClawBot (direct, audio/ogg, 13 chars)
11:23:11
warn
media-understanding
audio: failed (0/1) reason=Command failed

gloomy waspBOT
# tribal plank I still get this in logs 11:23:08 info channels/discord discord client initializ...

That means the config is being picked up, but the configured command is still exiting non-zero. The normal log is hiding the actual stderr, so now we need to isolate the failing command outside OpenClaw.

Run these on the same machine, ideally as the same user that runs the gateway:

ls -l /home/admin/whisper.cpp/build/bin/whisper-cli
ls -l /home/admin/whisper.cpp/models/ggml-base.en.bin
/home/admin/whisper.cpp/build/bin/whisper-cli --help | head

Then test the exact audio flow OpenClaw uses:

ffmpeg -y -i /path/to/telegram-audio.ogg \
  -ac 1 -ar 16000 -c:a pcm_s16le /tmp/openclaw-whisper-test.wav
gloomy waspBOT
# tribal plank I still get this in logs 11:23:08 info channels/discord discord client initializ...
/home/admin/whisper.cpp/build/bin/whisper-cli \
  -m /home/admin/whisper.cpp/models/ggml-base.en.bin \
  -otxt -of /tmp/openclaw-whisper-test -np -nt -f /tmp/openclaw-whisper-test.wav

If that fails, paste the terminal error from that command. That will be the real issue.

If that works manually, then it’s almost certainly the gateway service environment. Common cases:

  • gateway runs as a different user and cannot read /home/admin/whisper.cpp/...
  • ffmpeg is installed in your shell PATH but not in the daemon PATH
  • whisper-cli needs a shared library path that your shell has but the service does not

Also worth checking:

gloomy waspBOT
tribal plank
gloomy waspBOT
# tribal plank Results are here https://cl1p.net/4560

That paste proves the binary and model exist, and it also confirms your build supports -f. It doesn’t show the actual transcription command against an audio file yet, which is the part that’s failing.

Please run this with a real Telegram audio file path:

ffmpeg -y -i /path/to/telegram-audio.ogg \
  -ac 1 -ar 16000 -c:a pcm_s16le /tmp/openclaw-whisper-test.wav

/home/admin/whisper.cpp/build/bin/whisper-cli \
  -m /home/admin/whisper.cpp/models/ggml-base.en.bin \
  -otxt -of /tmp/openclaw-whisper-test -np -nt -f /tmp/openclaw-whisper-test.wav

cat /tmp/openclaw-whisper-test.txt

If that manual command succeeds, the next best move is a tiny wrapper so we can see what OpenClaw is actually invoking and what stderr says. Example:

gloomy waspBOT
# tribal plank Results are here https://cl1p.net/4560
cat > /home/admin/whisper-openclaw-debug.sh <<'SH'
#!/usr/bin/env bash
{
  echo "=== $(date) ==="
  echo "USER=$(id)"
  echo "PWD=$PWD"
  echo "PATH=$PATH"
  printf 'ARGS:'; printf ' <%s>' "$@"; echo
} >> /tmp/openclaw-whisper-debug.log

/home/admin/whisper.cpp/build/bin/whisper-cli "$@" >> /tmp/openclaw-whisper-debug.log 2>&1
status=$?
echo "EXIT=$status" >> /tmp/openclaw-whisper-debug.log
exit $status
SH
gloomy waspBOT
# tribal plank Results are here https://cl1p.net/4560
chmod +x /home/admin/whisper-openclaw-debug.sh

Then temporarily set:

"command": "/home/admin/whisper-openclaw-debug.sh"

Keep the same args, restart the gateway, send one audio message, then check:

cat /tmp/openclaw-whisper-debug.log

That should reveal whether OpenClaw is passing a wav path, whether the service user can read it, and the real whisper.cpp error.

tribal plank
#

Isn't there an example file that comes with whisper I can use?

gloomy waspBOT
# tribal plank Isn't there an example file that comes with whisper I can use?

Yes. whisper.cpp usually includes sample audio under the repo, commonly samples/jfk.wav.

Try:

find /home/admin/whisper.cpp -iname '*.wav' -o -iname '*.ogg' -o -iname '*.mp3' | head -20

If you see /home/admin/whisper.cpp/samples/jfk.wav, test with:

/home/admin/whisper.cpp/build/bin/whisper-cli \
  -m /home/admin/whisper.cpp/models/ggml-base.en.bin \
  -otxt -of /tmp/openclaw-whisper-test -np -nt \
  -f /home/admin/whisper.cpp/samples/jfk.wav
gloomy waspBOT
tribal plank
#

this is what I get

admin@PiClaw:~/whisper.cpp $ /home/admin/whisper.cpp/build/bin/whisper-cli
-m /home/admin/whisper.cpp/models/ggml-base.en.bin
-otxt -of /tmp/openclaw-whisper-test -np -nt
-f /home/admin/whisper.cpp/samples/jfk.wav

And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.output_txt: saving output to '/tmp/openclaw-whisper-test.txt'

gloomy waspBOT
# tribal plank this is what I get admin@PiClaw:~/whisper.cpp $ /home/admin/whisper.cpp/build/b...

Good, whisper.cpp itself works. So the remaining failure is almost certainly in the OpenClaw runtime path: either the Telegram ogg -> wav conversion, the service user/environment, or the temp file OpenClaw passes to whisper.

Next test the same format OpenClaw receives:

ffmpeg -y -i /path/to/a/telegram-audio.ogg \
  -ac 1 -ar 16000 -c:a pcm_s16le /tmp/openclaw-whisper-test.wav

/home/admin/whisper.cpp/build/bin/whisper-cli \
  -m /home/admin/whisper.cpp/models/ggml-base.en.bin \
  -otxt -of /tmp/openclaw-whisper-test -np -nt \
  -f /tmp/openclaw-whisper-test.wav

cat /tmp/openclaw-whisper-test.txt

If that works too, use the debug wrapper I posted above as the command temporarily. That will capture the exact OpenClaw args + stderr in /tmp/openclaw-whisper-debug.log.

gloomy waspBOT
tribal plank
#

how do I find the path too .ogg file

gloomy waspBOT
# tribal plank how do I find the path too .ogg file

You don’t actually need the Telegram file for the next test. Make an .ogg from the working sample:

ffmpeg -y -i /home/admin/whisper.cpp/samples/jfk.wav /tmp/jfk.ogg

ffmpeg -y -i /tmp/jfk.ogg \
  -ac 1 -ar 16000 -c:a pcm_s16le /tmp/openclaw-whisper-test.wav

/home/admin/whisper.cpp/build/bin/whisper-cli \
  -m /home/admin/whisper.cpp/models/ggml-base.en.bin \
  -otxt -of /tmp/openclaw-whisper-test -np -nt \
  -f /tmp/openclaw-whisper-test.wav

cat /tmp/openclaw-whisper-test.txt

That tests the same OGG -> WAV -> whisper path OpenClaw uses.

gloomy waspBOT
# tribal plank how do I find the path too .ogg file

For the actual Telegram attachment: OpenClaw downloads it to a temp file like /tmp/openclaw... / openclaw-media..., but it usually deletes it after the run, so it’s hard to grab directly. The debug wrapper is the better way to see the exact temporary path and stderr. If you want to try catching it live:

find /tmp -name 'openclaw-media*' -o -name '*.ogg' -o -name '*.oga'

but don’t be surprised if nothing is there after the failure finishes.

tribal plank
gloomy waspBOT
# tribal plank You can find results here https://cl1p.net/4561

Great, that proves all three pieces work manually:

  • ffmpeg can make OGG
  • ffmpeg can convert OGG -> WAV
  • whisper-cli can transcribe the converted WAV

So the failure is specific to the gateway runtime context. The fastest way to get the real error is the wrapper. Use this exactly:

cat > /home/admin/whisper-openclaw-debug.sh <<'SH'
#!/usr/bin/env bash
{
  echo "=== $(date) ==="
  echo "USER=$(id)"
  echo "PWD=$PWD"
  echo "PATH=$PATH"
gloomy waspBOT
# tribal plank You can find results here https://cl1p.net/4561
  printf 'ARGS:'; printf ' <%s>' "$@"; echo
} >> /tmp/openclaw-whisper-debug.log

/home/admin/whisper.cpp/build/bin/whisper-cli "$@" >> /tmp/openclaw-whisper-debug.log 2>&1
status=$?
echo "EXIT=$status" >> /tmp/openclaw-whisper-debug.log
exit $status
SH
chmod +x /home/admin/whisper-openclaw-debug.sh

Then change only the command in config:

"command": "/home/admin/whisper-openclaw-debug.sh"
gloomy waspBOT
gloomy waspBOT
tribal plank
#

=== Fri 15 May 11:40:45 BST 2026 ===
USER=uid=1000(admin) gid=1000(admin) groups=1000(admin),4(adm),20(dialout),24(cdrom),27(sudo),29(audio),44(video),46(plugdev),60(games),100(users),102(netdev),984(ollama),986(gpio),988(i2c),989(spi),992(render),996(input)
PWD=/home/admin
PATH=/usr/bin:/bin:/home/admin/.local/bin:/home/admin/.npm-global/bin:/home/admin/bin:/home/admin/.bun/bin:/home/admin/.nix-profile/bin:/usr/local/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin
ARGS: <-m> </home/admin/whisper.cpp/models/ggml-base.en.bin> <-otxt> <-of> </tmp/openclaw/openclaw-media-cli-Ofhsv2/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc> <-np> <-nt> <-f> </home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg>
error: failed to read audio data as wav (Unknown error)
error: failed to read audio file '/home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg'
EXIT=0
=== Fri 15 May 11:40:45 BST 2026 ===
USER=uid=1000(admin) gid=1000(admin) groups=1000(admin),4(adm),20(dialout),24(cdrom),27(sudo),29(audio),44(video),46(plugdev),60(games),100(users),102(netdev),984(ollama),986(gpio),988(i2c),989(spi),992(render),996(input)
PWD=/home/admin
PATH=/usr/bin:/bin:/home/admin/.local/bin:/home/admin/.npm-global/bin:/home/admin/bin:/home/admin/.bun/bin:/home/admin/.nix-profile/bin:/usr/local/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin
ARGS: <-m> </home/admin/whisper.cpp/models/ggml-base.en.bin> <-otxt> <-of> </tmp/openclaw/openclaw-media-cli-TfOCn2/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc> <-np> <-nt> <-f> </home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg>
error: failed to read audio data as wav (Unknown error)
error: failed to read audio file '/home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg'
EXIT=0

gloomy waspBOT
# tribal plank === Fri 15 May 11:40:45 BST 2026 === USER=uid=1000(admin) gid=1000(admin) groups...

Aha, the wrapper changed the behavior.

OpenClaw only does the automatic OGG -> WAV transcode when the configured command basename is exactly whisper-cli. With:

"command": "/home/admin/whisper-openclaw-debug.sh"

the basename is whisper-openclaw-debug.sh, so OpenClaw passed the original .ogg straight through. That’s why your log shows:

-f /home/admin/.openclaw/media/inbound/...ogg
error: failed to read audio data as wav

Make the debug wrapper’s filename be whisper-cli so OpenClaw still recognizes it:

gloomy waspBOT
# tribal plank === Fri 15 May 11:40:45 BST 2026 === USER=uid=1000(admin) gid=1000(admin) groups...
mkdir -p /home/admin/openclaw-whisper-debug

cat > /home/admin/openclaw-whisper-debug/whisper-cli <<'SH'
#!/usr/bin/env bash
{
  echo "=== $(date) ==="
  echo "USER=$(id)"
  echo "PWD=$PWD"
  echo "PATH=$PATH"
  printf 'ARGS:'; printf ' <%s>' "$@"; echo
} >> /tmp/openclaw-whisper-debug.log

/home/admin/whisper.cpp/build/bin/whisper-cli "$@" >> /tmp/openclaw-whisper-debug.log 2>&1
status=$?
echo "EXIT=$status" >> /tmp/openclaw-whisper-debug.log
gloomy waspBOT
gloomy waspBOT
tribal plank
gloomy waspBOT
# tribal plank Results here https://cl1p.net/4560

That result is useful: ffmpeg can read the actual Telegram .ogg and convert it to wav successfully. So the inbound file itself is fine.

The missing piece now is the renamed wrapper test, where the command path basename is still whisper-cli:

"command": "/home/admin/openclaw-whisper-debug/whisper-cli"

After restarting the gateway and sending one audio message, please paste:

cat /tmp/openclaw-whisper-debug.log

What I’m looking for is whether the ARGS line ends in a temp .wav path like:

gloomy waspBOT
# tribal plank Results here https://cl1p.net/4560
-f /tmp/openclaw/...something.wav

or still ends in the original .ogg.

If it’s still .ogg, OpenClaw isn’t recognizing the command as whisper-cli.
If it’s .wav, then we’ll see the real whisper-cli stderr for the converted file.

tribal plank
#

admin@PiClaw:~/.openclaw $ cat /tmp/openclaw-whisper-debug.log
=== Fri 15 May 11:40:45 BST 2026 ===
USER=uid=1000(admin) gid=1000(admin) groups=1000(admin),4(adm),20(dialout),24(cdrom),27(sudo),29(audio),44(video),46(plugdev),60(games),100(users),102(netdev),984(ollama),986(gpio),988(i2c),989(spi),992(render),996(input)
PWD=/home/admin
PATH=/usr/bin:/bin:/home/admin/.local/bin:/home/admin/.npm-global/bin:/home/admin/bin:/home/admin/.bun/bin:/home/admin/.nix-profile/bin:/usr/local/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin
ARGS: <-m> </home/admin/whisper.cpp/models/ggml-base.en.bin> <-otxt> <-of> </tmp/openclaw/openclaw-media-cli-Ofhsv2/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc> <-np> <-nt> <-f> </home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg>
error: failed to read audio data as wav (Unknown error)
error: failed to read audio file '/home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg'
EXIT=0
=== Fri 15 May 11:40:45 BST 2026 ===
USER=uid=1000(admin) gid=1000(admin) groups=1000(admin),4(adm),20(dialout),24(cdrom),27(sudo),29(audio),44(video),46(plugdev),60(games),100(users),102(netdev),984(ollama),986(gpio),988(i2c),989(spi),992(render),996(input)
PWD=/home/admin
PATH=/usr/bin:/bin:/home/admin/.local/bin:/home/admin/.npm-global/bin:/home/admin/bin:/home/admin/.bun/bin:/home/admin/.nix-profile/bin:/usr/local/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin
ARGS: <-m> </home/admin/whisper.cpp/models/ggml-base.en.bin> <-otxt> <-of> </tmp/openclaw/openclaw-media-cli-TfOCn2/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc> <-np> <-nt> <-f> </home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg>
error: failed to read audio data as wav (Unknown error)
error: failed to read audio file '/home/admin/.openclaw/media/inbound/file_12---177d8f2e-6faf-42e7-a5ee-5a131da5dcdc.ogg'
EXIT=0

gloomy waspBOT
gloomy waspBOT
tribal plank
#

I'm not seeing debug log changing. I am seeing this in openclaw log

11:55:03
info
channels/discord
discord channels resolved: 1479508087080751206/1479508087080751209 (guild:OpenClaw; channel:general), 1479508087080751206/1479874320934113504 (guild:OpenClaw; channel:vega-lite)
11:55:04
info
gateway/ws
⇄ res ✓ logs.tail 955ms conn=b68af1bd…9a1d id=4bcc7300…4eb6
11:55:04
info
channels/discord
discord client initialized as 1479506703665074350; awaiting gateway readiness
11:55:05
warn
media-understanding
audio: failed (0/1) reason=Command failed
11:55:05
info
gateway/channels/telegram/inbound
Inbound message telegram:6938637819 -> @RSAPiClawBot (direct, audio/ogg, 13 chars)
11:55:05
warn
media-understanding
audio: failed (0/1) reason=Command failed

gloomy waspBOT
# tribal plank I'm not seeing debug log changing. I am seeing this in openclaw log 11:55:03 in...

If the debug log is not changing, OpenClaw is not running that wrapper. So the current failure is coming from a different command/config than the one you edited, or the gateway didn’t restart with that config.

Check these:

openclaw gateway status
openclaw status --all

Look for the config path the gateway is using. Then search the config files for every whisper entry:

grep -R "whisper-cli\|whisper-openclaw\|openclaw-whisper-debug" ~/.openclaw -n

You want the active config to contain:

gloomy waspBOT
# tribal plank I'm not seeing debug log changing. I am seeing this in openclaw log 11:55:03 in...
"command": "/home/admin/openclaw-whisper-debug/whisper-cli"

Also verify the wrapper exists and is executable:

ls -l /home/admin/openclaw-whisper-debug/whisper-cli
/home/admin/openclaw-whisper-debug/whisper-cli --help | head

If openclaw gateway status shows a different config path than the file you edited, move the change there. If the config path is right but the wrapper still isn’t called after restart, paste the tools.media.audio block from the active config and the config path shown by gateway status.

One small clue: your log still fails immediately at the same point, but no wrapper write means we’re not debugging the command OpenClaw is actually executing yet.

tribal plank
gloomy waspBOT
# tribal plank Results here https://cl1p.net/4561

That grep is actually helpful: the active ~/.openclaw/openclaw.json does point at:

/home/admin/openclaw-whisper-debug/whisper-cli

So if the debug log is not changing, the next likely issue is that this wrapper path either does not exist, is not executable, or is failing before it can write the log.

Run:

ls -l /home/admin/openclaw-whisper-debug/whisper-cli
head -40 /home/admin/openclaw-whisper-debug/whisper-cli
/home/admin/openclaw-whisper-debug/whisper-cli --help | head

Also reset the debug file and confirm it gets created by a manual wrapper run:

gloomy waspBOT
# tribal plank Results here https://cl1p.net/4561
rm -f /tmp/openclaw-whisper-debug.log
/home/admin/openclaw-whisper-debug/whisper-cli --help >/tmp/wrapper-help.txt 2>&1
cat /tmp/openclaw-whisper-debug.log

If that manual wrapper run does not create /tmp/openclaw-whisper-debug.log, the wrapper contents/path are wrong.

If it does create the log manually, then paste:

openclaw gateway status
openclaw status --all

because then the gateway is probably running from a different config path or not actually restarted into the config we’re looking at.

tribal plank
#

-rwxrwxr-x 1 admin admin 356 May 15 11:43 /home/admin/openclaw-whisper-debug/whisper-cli
#!/usr/bin/env bash
{
echo "=== $(date) ==="
echo "USER=$(id)"
echo "PWD=$PWD"
echo "PATH=$PATH"
printf 'ARGS:'; printf ' <%s>' "$@"; echo
} >> /tmp/openclaw-whisper-debug.log

/home/admin/whisper.cpp/build/bin/whisper-cli "$@" >> /tmp/openclaw-whisper-debug.log 2>&1
status=$?
echo "EXIT=$status" >> /tmp/openclaw-whisper-debug.log
exit $status

gloomy waspBOT
# tribal plank -rwxrwxr-x 1 admin admin 356 May 15 11:43 /home/admin/openclaw-whisper-debug/whi...

The wrapper file looks fine.

One important catch: OpenClaw’s special OGG -> WAV transcode check compares the command basename to exactly whisper-cli. In your installed build, it may be using the raw path basename correctly, but to avoid any path edge case, put the wrapper directory at the front of the gateway PATH and set the command to just whisper-cli.

Config:

"command": "whisper-cli"

Keep the same args.

Then start the gateway with this PATH, or add it to the service env:

export PATH="/home/admin/openclaw-whisper-debug:$PATH"
gloomy waspBOT
gloomy waspBOT
tribal plank
#

cat: /tmp/openclaw-whisper-debug.log: No such file or directory

gloomy waspBOT
# tribal plank cat: /tmp/openclaw-whisper-debug.log: No such file or directory

That means the wrapper did not run at all.

Let’s separate the two cases.

First, test the wrapper directly:

rm -f /tmp/openclaw-whisper-debug.log
bash -x /home/admin/openclaw-whisper-debug/whisper-cli --help >/tmp/wrapper-help.txt 2>/tmp/wrapper-bash-x.txt

cat /tmp/openclaw-whisper-debug.log
cat /tmp/wrapper-bash-x.txt

If that still says the debug log does not exist, paste /tmp/wrapper-bash-x.txt.

If the direct test does create the log, but sending Telegram audio does not, then OpenClaw is not executing that wrapper. In that case run:

gloomy waspBOT
# tribal plank cat: /tmp/openclaw-whisper-debug.log: No such file or directory
openclaw gateway status
openclaw status --all
grep -n -A25 -B5 '"audio"' ~/.openclaw/openclaw.json

Also, if you changed config to:

"command": "whisper-cli"

then the gateway must have /home/admin/openclaw-whisper-debug at the front of its PATH. If it doesn’t, it will run the real whisper-cli instead of the wrapper. For now, the simplest debug config is still the absolute wrapper path:

"command": "/home/admin/openclaw-whisper-debug/whisper-cli"
gloomy waspBOT
tribal plank
gloomy waspBOT
gloomy waspBOT
# tribal plank Results here https://cl1p.net/4560
/home/admin/bin

So OpenClaw is probably trying to run whisper-cli from PATH and failing before your wrapper ever runs.

For debugging, put the wrapper where the service already looks:

mkdir -p /home/admin/bin
ln -sf /home/admin/openclaw-whisper-debug/whisper-cli /home/admin/bin/whisper-cli
rm -f /tmp/openclaw-whisper-debug.log
openclaw gateway restart

Keep config as:

gloomy waspBOT
# tribal plank Results here https://cl1p.net/4560
"command": "whisper-cli"

Then send one audio message and check:

cat /tmp/openclaw-whisper-debug.log

If that works, the final non-debug setup can be either:

"command": "/home/admin/whisper.cpp/build/bin/whisper-cli"

or keep "command": "whisper-cli" and symlink the real binary into /home/admin/bin/whisper-cli after removing the debug symlink.

gloomy waspBOT
tribal plank
#

cat: /tmp/openclaw-whisper-debug.log: No such file or directory

gloomy waspBOT
# tribal plank cat: /tmp/openclaw-whisper-debug.log: No such file or directory

No debug log means OpenClaw is failing before it ever reaches whisper-cli. Since your config command is now "whisper-cli", OpenClaw tries its internal OGG -> WAV ffmpeg step first; if that step fails, the whisper wrapper never runs.

At this point I’d stop fighting the special-case path and use a small wrapper that handles OGG itself. This bypasses OpenClaw’s internal whisper-cli transcode detection and prints the transcript to stdout, which OpenClaw can consume.

Create this:

cat > /home/admin/openclaw-whisper-transcribe.sh <<'SH'
#!/usr/bin/env bash
set -euo pipefail

model=""
input=""

while [[ $# -gt 0 ]]; do
  case "$1" in
gloomy waspBOT
gloomy waspBOT
# tribal plank cat: /tmp/openclaw-whisper-debug.log: No such file or directory

workdir="$(mktemp -d /tmp/openclaw-whisper.XXXXXX)"
trap 'rm -rf "$workdir"' EXIT

wav="$workdir/input.wav"
out="$workdir/transcript"

ffmpeg -y -i "$input" -ac 1 -ar 16000 -c:a pcm_s16le "$wav" >/dev/null 2>&1

/home/admin/whisper.cpp/build/bin/whisper-cli \
  -m "$model" \
  -otxt -of "$out" -np -nt -f "$wav" >/dev/null 2>&1

cat "$out.txt"
SH
gloomy waspBOT
gloomy waspBOT
# tribal plank cat: /tmp/openclaw-whisper-debug.log: No such file or directory
  "timeoutSeconds": 60
}

Keep your surrounding enabled, echoTranscript, and echoFormat settings as-is. Restart gateway after editing.

Why this should work: it passes the original Telegram .ogg into your script, the script runs ffmpeg itself, then calls your known-working whisper.cpp command, then prints only the transcript.

tribal plank
#

ah, that actually worked

gloomy waspBOT
# tribal plank ah, that actually worked

Nice, glad we got it pinned down.

What happened: OpenClaw’s built-in whisper-cli special path was failing before it reached your binary, so wrapping the whole OGG -> WAV -> whisper flow made it explicit and reliable.

I’d leave that wrapper in place for now. If you want to clean up later, the final config should stay as the wrapper command:

{
  "type": "cli",
  "command": "/home/admin/openclaw-whisper-transcribe.sh",
  "args": [
    "-m",
    "/home/admin/whisper.cpp/models/ggml-base.en.bin",
    "{{MediaPath}}"
  ],
  "timeoutSeconds": 60
gloomy waspBOT
tribal plank
#

Yes, please tell me how to clean up

gloomy waspBOT
# tribal plank Yes, please tell me how to clean up

Yep. Clean it up like this:

  1. Keep the working wrapper:
ls -l /home/admin/openclaw-whisper-transcribe.sh
  1. Remove the debug wrapper/symlink if you made them:
rm -f /home/admin/bin/whisper-cli
rm -rf /home/admin/openclaw-whisper-debug
rm -f /tmp/openclaw-whisper-debug.log /tmp/wrapper-help.txt /tmp/wrapper-bash-x.txt

If you prefer recoverable cleanup and have trash installed, use trash instead of rm -rf for the debug directory.

gloomy waspBOT
# tribal plank Yes, please tell me how to clean up
  1. Make sure your final tools.media.audio.models entry is only this:
{
  "type": "cli",
  "command": "/home/admin/openclaw-whisper-transcribe.sh",
  "args": [
    "-m",
    "/home/admin/whisper.cpp/models/ggml-base.en.bin",
    "{{MediaPath}}"
  ],
  "timeoutSeconds": 60
}

So the full audio block should be:

gloomy waspBOT
# tribal plank Yes, please tell me how to clean up
"audio": {
  "enabled": true,
  "echoTranscript": true,
  "echoFormat": "📝 I heard: {transcript}",
  "models": [
    {
      "type": "cli",
      "command": "/home/admin/openclaw-whisper-transcribe.sh",
      "args": [
        "-m",
        "/home/admin/whisper.cpp/models/ggml-base.en.bin",
        "{{MediaPath}}"
      ],
      "timeoutSeconds": 60
    }
gloomy waspBOT
tribal plank
gloomy waspBOT
# tribal plank how about whisper-openclaw-debug.sh

Yes, remove that too. It was only for debugging.

rm -f /home/admin/whisper-openclaw-debug.sh

Or with trash, if installed:

trash /home/admin/whisper-openclaw-debug.sh

Keep only the working production wrapper:

/home/admin/openclaw-whisper-transcribe.sh