Describe what you're trying to do:
I'm using the native Gemini TTS feature in OpenClaw 2026.4.15. My goal is to have the agent automatically reply with voice messages using the google provider (Gemini 3.1 Flash TTS).
What's happening instead:
When the agent generates a response, it correctly triggers the TTS. A .wav file is successfully generated in the media/outbound/ directory. However, the agent fails to deliver the audio message to the channel (WhatsApp).
Interestingly:
- Manual audio delivery works perfectly when the file is in .mp3 format.
- The agent seems to ignore or fail to process the .wav file for delivery.
Setup:
- Hosting: Local Macbook Pro (macOS 15.x)
- OpenClaw Version: 2026.4.15
- Node Version: 24.11.1
- Provider: Google Gemini (native plugin)
- Channel: WhatsApp
Relevant Config:
{
"messages": {
"tts": {
"auto": "always",
"provider": "google",
"providers": {
"google": {
"model": "gemini-3.1-flash-tts-preview",
"voiceName": "Gacrux"
}
}
}
}
}
Logs/Error Messages:
The files are generated locally:
~/openclaw-local/jedi-labs/media/outbound/tts-hera-XXXX.wav exists on disk.
But the agent log shows no attempt to upload/send the .wav file, unlike when .mp3 files are present.
Is there a known limitation with .wav delivery in the current version or a missing ffmpeg conversion step for the native Gemini TTS plugin?