#Voice-call issue - not my agent is callin

1 messages · Page 1 of 1 (latest)

frozen dock
#

Hi brothers.
I installed 2026-4-26 - now voice-call plugin is loading and kinda "works". But when I ask my agent to call me or I call to agent, some "generic" OpenClaw agent talks to me. My agent's name is Ash. When I ask an agent on the call his name he says his name is OpenClaw, so indenity/soul etc not loaded. How to fix it>
My config

"voice-call": {
"enabled": true,
"config": {
"provider": "twilio",
"inboundPolicy": "open",

        "fromNumber": "+614******************",
        "twilio": {
          "accountSid": "**********************",
          "authToken": "**********************",
        },
        "realtime": {
            "enabled": true,
            "provider": "openai"
        },
        "outbound": {
                    "defaultMode": "conversation"
                  },
        "serve": {
          "port": 3338,
          "path": "/voice/webhook"
        },
        "publicUrl": "https://unslumbrous-spiteless-jeri.ngrok-free.dev/voice/webhook",
        "outbound": {
            "defaultMode": "conversation"
        },
        "streaming": {
          "enabled": false
        },
        "tts": {
          "provider": "openai",
          "providers": {
            "openai": {
              "model": "gpt-4o-mini-tts",
              "voice": "coral"
            }
          }
        }
      }
    }
meager roostBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

blazing dirgeBOT
#

First: can you paste exact openclaw --version? But based on this config, I think this is expected realtime behavior.

With realtime.enabled: true, the phone call is fronted by the realtime voice model, not Ash directly on every utterance. The default realtime prompt literally identifies as OpenClaw’s phone-call interface, and it only calls the real agent through openclaw_agent_consult when needed.

Fix:

"voice-call": {
  "enabled": true,
  "config": {
    "provider": "twilio",

    // use your actual agent id, not just display name
    "agentId": "ash",

    "realtime": {
#
      "enabled": true,
      "provider": "openai",
      "instructions": "You are Ash, the user's OpenClaw voice assistant. Your name is Ash. Keep replies brief and natural. If the caller asks anything requiring memory, tools, personal context, current info, or deeper reasoning, call openclaw_agent_consult before answering.",
      "toolPolicy": "safe-read-only"
    }
  }
}

A few notes:

  • agentId belongs at the top level of the voice-call config.
  • Use the configured agent id — likely ash, not necessarily display name Ash.
  • If you want the consult tool to use all normal Ash tools, set realtime.toolPolicy: "owner", but don’t do that with inboundPolicy: "open". Use allowlist first.
  • You have duplicate outbound blocks; remove one just to avoid confusion.
frozen dock
#

The version is OpenClaw 2026.4.26 (be8c246)

blazing dirgeBOT
#

Yep — 2026.4.26 confirms it.

In that version, realtime voice has its own default prompt:

“You are OpenClaw’s phone-call realtime voice interface…”

So agentId loads Ash only for the openclaw_agent_consult tool. The live realtime model still needs its own realtime.instructions.

Use:

"agentId": "ash",
"realtime": {
  "enabled": true,
  "provider": "openai",
  "instructions": "You are Ash, the user's OpenClaw voice assistant. Your name is Ash. Keep replies brief and natural. If the caller asks anything requiring memory, tools, personal context, current info, or deeper reasoning, call openclaw_agent_consult before answering.",
#
  "toolPolicy": "safe-read-only"
}

Then restart:

openclaw gateway restart

If you’re not sure the agent id is exactly ash, check:

openclaw agents list
#

Also: since you have inboundPolicy: "open", keep toolPolicy: "safe-read-only" for now. Don’t use "owner" until inbound is restricted with allowlist.

frozen dock
#

Openclaw complains on startup that voice-call config has wrong entries. voice-call needs some work, not usable now

blazing dirgeBOT
#

Ah, yep — I found the sharp edge. I think my previous agentId suggestion may hit a 2026.4.26 schema bug.

The runtime code supports agentId, but the plugin manifest schema for voice-call appears to not list agentId, so startup can reject it as an unknown/wrong config entry. That’s definitely “needs work” territory.

For now, remove agentId and only use the fields the schema accepts:

"realtime": {
  "enabled": true,
  "provider": "openai",
  "instructions": "You are Ash, the user's OpenClaw voice assistant. Your name is Ash. Keep replies brief and natural. If the caller asks anything requiring memory, tools, personal context, current info, or deeper reasoning, call openclaw_agent_consult before answering.",
  "toolPolicy": "safe-read-only"
},
"responseSystemPrompt": "You are Ash, the user's OpenClaw voice assistant. Keep phone responses brief and natural."

Important caveat: without agentId, the consult path falls back to main, so this mainly fixes the realtime voice identity/name, not necessarily routing to a separate ash agent workspace.

frozen dock
#

My agent Ash id is main., he is my first and dafault agent, but it is not him who is responding. The one who responding starts fresh, without any recollection of me or anything.

blazing dirgeBOT
#

Yeah — that matches what the code is doing. With:

realtime: { enabled: true }

you’re talking to the OpenAI realtime voice session, not directly to Ash’s normal OpenClaw session. It does not automatically load Ash’s full identity/SOUL/memory context. It only has the realtime prompt, plus the optional openclaw_agent_consult tool.

So for “I want the actual Ash with memory/recollection,” the current workaround is: turn realtime off and use normal conversation mode:

"outbound": {
  "defaultMode": "conversation"
},
"realtime": {
  "enabled": false
#
},
"streaming": {
  "enabled": false
},
"responseSystemPrompt": "You are Ash, the user's OpenClaw voice assistant. Keep replies brief, natural, and conversational."

Then restart:

openclaw gateway restart

If you keep realtime.enabled: true, you can make it pretend to be Ash with realtime.instructions, but it still won’t be fully Ash unless it calls openclaw_agent_consult. That’s why it feels “fresh.”

So I’d call this a real limitation/bug in realtime voice right now: realtime should probably inherit the agent identity/system prompt or consult the agent much more aggressively.

frozen dock
#

I set realtime off. Now call occures but notbody is talking to me anymore

blazing dirgeBOT
#

That’s because you turned realtime off but still have:

"streaming": { "enabled": false }

So Twilio connects the call, but there’s no live audio path doing STT/TTS. For “actual Ash with memory,” try legacy streaming conversation mode:

"outbound": {
  "defaultMode": "conversation"
},
"inboundGreeting": "Hi, this is Ash. How can I help?",
"realtime": {
  "enabled": false
},
#
"streaming": {
  "enabled": true,
  "provider": "openai",
  "streamPath": "/voice/stream",
  "providers": {
    "openai": {
      "model": "gpt-4o-transcribe"
    }
  }
},
"responseSystemPrompt": "You are Ash, the user's OpenClaw voice assistant. Keep replies brief, natural, and conversational."

Keep your existing tts block.

Then:

#
openclaw gateway restart
openclaw logs --follow

If it still connects silently, the useful log lines are usually around voice-call, Media streaming initialized, Failed to speak initial message, or Telephony TTS failed.

Docs: https://docs.openclaw.ai/plugins/voice-call

frozen dock
#

I made this change
10:09:46 [voice-call] Starting max duration timer (300s) for call aa6ed32d-fdca-4408-bda5-f3147d387a8b
10:09:46 [voice-call] Starting max duration timer (300s) for call aa6ed32d-fdca-4408-bda5-f3147d387a8b
10:09:46 [MediaStream] Twilio connected
10:09:46 [MediaStream] Stream started: MZ28f49621b5175a567b51c9413a09b91c (call: CA77461d1738a88a312da40e938481454a)
10:09:46 [voice-call] Media stream connected: CA77461d1738a88a312da40e938481454a -> MZ28f49621b5175a567b51c9413a09b91c
10:09:46 [voice-call] Speaking initial message for call aa6ed32d-fdca-4408-bda5-f3147d387a8b (mode: conversation)
10:10:02 [MediaStream] Transcription session error: OpenAI realtime transcription connection timeout
10:10:02 [MediaStream] STT connection failed (TTS still works): OpenAI realtime transcription connection timeout
10:10:02 [MediaStream] TTS playback error: Error: Telephony TTS synthesis timed out after 8000ms
at Timeout.<anonymous> (file:///Users/macmini/.openclaw/plugin-runtime-deps/openclaw-2026.4.26-da6bdffc3d96/dist/twilio-BvvtZITB.js:529:30)
at listOnTimeout (node:internal/timers:588:17)
at processTimers (node:internal/timers:523:7)
10:10:02 [voice-call] Telephony TTS failed: Telephony TTS synthesis timed out after 8000ms
10:10:02 [voice-call] Failed to speak initial message: Telephony TTS synthesis timed out after 8000ms
10:10:04 [MediaStream] Stream stopped: MZ28f49621b5175a567b51c9413a09b91c
10:10:04 [voice-call] Media stream disconnected: CA77461d1738a88a312da40e938481454a (MZ28f49621b5175a567b51c9413a09b91c)
10:10:05 [MediaStream] WebSocket closed (code: 1005, reason: none)

blazing dirgeBOT
#

Nice, these logs are useful. This is past the Twilio/ngrok part now — Twilio connects and the media stream starts.

The actual failures are both OpenAI-side:

OpenAI realtime transcription connection timeout
Telephony TTS synthesis timed out after 8000ms

So Ash isn’t talking because the initial TTS never finishes.

Try increasing the voice-call TTS timeout first:

"streaming": {
  "enabled": true,
#
  "provider": "openai",
  "streamPath": "/voice/stream",
  "providers": {
    "openai": {
      "model": "gpt-4o-transcribe"
    }
  }
},
"tts": {
  "provider": "openai",
  "timeoutMs": 30000,
  "providers": {
    "openai": {
      "model": "gpt-4o-mini-tts",
      "voice": "coral"
#
    }
  }
}

Then restart:

openclaw gateway restart
openclaw logs --follow

Also double-check the gateway service actually sees OPENAI_API_KEY — on macOS services, shell env often isn’t inherited. If the timeout remains, try tts-1 as a faster debugging TTS model.

At this point I’d also file this as a voice-call bug/fragility with those logs: https://github.com/openclaw/openclaw/issues

frozen dock
#

Now I could hear Ash, but Ash cannot hear me
10:18:06 [voice-call] Starting max duration timer (300s) for call dc4c6603-33de-4faf-b196-0c6c0a095faa
10:18:06 [voice-call] Starting max duration timer (300s) for call dc4c6603-33de-4faf-b196-0c6c0a095faa
10:18:07 [MediaStream] Twilio connected
10:18:07 [MediaStream] Stream started: MZ52100327f797621699658c8dd6cf421e (call: CA66ce36b5465baa8e1fa448cb7bb0342c)
10:18:07 [voice-call] Media stream connected: CA66ce36b5465baa8e1fa448cb7bb0342c -> MZ52100327f797621699658c8dd6cf421e
10:18:07 [voice-call] Speaking initial message for call dc4c6603-33de-4faf-b196-0c6c0a095faa (mode: conversation)
10:18:23 [MediaStream] Transcription session error: OpenAI realtime transcription connection timeout
10:18:23 [MediaStream] STT connection failed (TTS still works): OpenAI realtime transcription connection timeout
10:19:00 [MediaStream] Stream stopped: MZ52100327f797621699658c8dd6cf421e
10:19:00 [voice-call] Media stream disconnected: CA66ce36b5465baa8e1fa448cb7bb0342c (MZ52100327f797621699658c8dd6cf421e)
10:19:01 [MediaStream] WebSocket closed (code: 1005, reason: none)

blazing dirgeBOT
#

Progress at least: Ash can speak now. The remaining problem is specifically STT:

OpenAI realtime transcription connection timeout
STT connection failed

So Twilio audio is reaching OpenClaw, but OpenClaw cannot open the OpenAI realtime transcription WebSocket. That’s why Ash can’t hear you.

In 2026.4.26 I don’t think this timeout is configurable, so you’ve got two practical options:

Option A: debug OpenAI realtime WS

  • Check VPN/firewall/proxy on the Mac mini.
  • It needs outbound WebSocket access to OpenAI realtime, not just normal HTTPS.
  • TTS working means your key is probably visible; this is likely the realtime/STT socket path.

Option B: switch only STT to Deepgram/xAI/etc

#

Keep OpenAI for TTS, but use another streaming provider for hearing:

"streaming": {
  "enabled": true,
  "provider": "deepgram",
  "streamPath": "/voice/stream",
  "providers": {
    "deepgram": {
      "apiKey": "${DEEPGRAM_API_KEY}",
      "model": "nova-3",
      "encoding": "mulaw",
      "sampleRate": 8000
    }
  }
}
frozen dock
#

Ok, Ash can send me voice message on Telegram and it is good quality, must be OpenIA model. How can I assign a dirrerent voice to evey agent, preferrbly with some wording for OpenIA model about how I want each vocie to sound? Where can I set it up?

blazing dirgeBOT
#

Yep — this is TTS config, not voice-call config.

Use per-agent TTS overrides under agents.list[].tts. Keep shared API key/model globally, then override voice/instructions per agent:

{
  messages: {
    tts: {
      auto: "always",
      provider: "openai",
      providers: {
        openai: {
          apiKey: "${OPENAI_API_KEY}",
          model: "gpt-4o-mini-tts"
        }
      }
#
    }
  },

  agents: {
    list: [
      {
        id: "main", // Ash
        tts: {
          providers: {
            openai: {
              voice: "coral",
              instructions: "Sound warm, calm, slightly playful, and emotionally present. Speak naturally, like Ash."
            }
          }
        }
#
      },
      {
        id: "other-agent",
        tts: {
          providers: {
            openai: {
              voice: "nova",
              instructions: "Sound crisp, professional, and concise."
            }
          }
        }
      }
    ]
  }
}