#404 voice agent

1 messages · Page 1 of 1 (latest)

grave moatBOT
#

Hey there! This channel allows you to ask Kapa.ai for assistance. But it looks like you haven't connected your Discord profile to your Deepgram account yet. You can do so at https://community.deepgram.com - by doing so, you'll be able to ask questions in #1115960287183511643 if you need human assistance!

shadow phoenixBOT
#

Hi @turbid sequoia. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

I’m limited by the knowledge sources you provided and they don’t include your specific C++ code, so I can’t directly spot what’s wrong in your snippet. I can, however, point out the most common Voice Agent–specific causes of 404 or non‑working C++ clients based on the docs.

Key things to check:

  1. Correct Voice Agent endpoint (V1 only)
    You must use the new V1 endpoint:
    wss://agent.deepgram.com/v1/agent/converse
    The older early‑access endpoint wss://agent.deepgram.com/agent is deprecated and should be replaced. [Migration guide]

  2. Handshake and auth headers
    Your C++ WebSocket client must send an Authorization header with either:

    • Authorization: token YOUR_API_KEY or
    • Authorization: bearer YOUR_TEMPORARY_TOKEN [Voice agent ref]

    Many C++ WS libraries require you to explicitly add custom headers during the connection; if you omit this, the connection may fail.

  3. Message flow order
    After connecting:

    • Wait for a {"type":"Welcome","request_id":"..."} message from the server.
    • Then send a Settings JSON message.
    • Only after you receive {"type":"SettingsApplied"} should you start sending audio or inject messages. [Message flow]

    If your C++ code sends audio or other messages before Welcome or before SettingsApplied, the agent may not behave correctly.

  4. Valid Settings payload
    Your first client message must be a JSON text frame like:

#
   {
     "type": "Settings",
     "audio": {
       "input": {
         "encoding": "linear16",
         "sample_rate": 24000
       },
       "output": {
         "encoding": "linear16",
         "sample_rate": 24000,
         "container": "none"
       }
     },
     "agent": {
       "language": "en",
       "listen": {
         "provider": {
           "type": "deepgram",
           "model": "nova-3",
           "smart_format": false
         }
       },
       "think": {
         "provider": {
           "type": "open_ai",
           "model": "gpt-4o-mini",
           "temperature": 0.7
         }
       },
       "speak": {
         "provider": {
           "type": "deepgram",
           "model": "aura-2-thalia-en"
         }
       }
     }
   }

This must be sent as a text WebSocket frame, not binary. [Settings]

  1. Binary vs text frames

    • JSON control messages (Settings, InjectUserMessage, etc.) → text frames
    • Raw PCM audio → binary frames
      If your C++ code sends JSON as binary or audio as text, the server will not interpret it correctly. [Voice agent ref; Message flow]
  2. Using a generic C++ WebSocket client
    The knowledge sources mention there is no official C++ SDK, and you must use a third‑party WebSocket library and handle headers and frames yourself. [C++ SDK discussion]
    If your code assumes an SDK or doesn’t set headers/SSL correctly, the connection may fail.

#

Because I don’t see your actual C++ snippet in the provided sources, I can’t pinpoint the exact line that’s wrong. If you paste the URL you’re connecting to and the initial headers/payload you send, I can map them directly against the documented Voice Agent requirements and call out the specific mismatch.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@turbid sequoia Kapa.ai is still learning and improving, please let me know how I did by reacting below.

#

Let me check the knowledge base for you again. One moment! :books:

#

I don’t see any C++ code in your last message, and there’s nothing in the provided knowledge sources that includes your specific snippet.

To help you spot why it wouldn’t work (and why you might see a 404 with Voice Agent), I need to see:

  • The WebSocket URL you’re using (the full wss://... string).
  • How you set headers (especially Authorization).
  • The first message you send after connect (your Settings JSON).

From the docs, the most common C++/WS issues that would break a Voice Agent client are:

  1. Wrong endpoint
    Must be:
    wss://agent.deepgram.com/v1/agent/converse
    The old early-access endpoint wss://agent.deepgram.com/agent is no longer correct and is explicitly replaced in v1. [Voice agent migration]

  2. Missing or incorrect auth header
    You must send an Authorization header with your key or token when opening the WebSocket, e.g. Authorization: token YOUR_API_KEY or Authorization: bearer YOUR_TOKEN. [Voice agent ref]

  3. Wrong message flow

    • Wait for {"type":"Welcome","request_id":"..."} from the server.
    • Then send a Settings JSON message.
    • Wait for {"type":"SettingsApplied"} before sending audio or injections. [Voice agent message flow]
  4. Invalid or missing Settings JSON
    First client message must be a text frame like the examples in the docs (with type: "Settings", audio.input, audio.output, and agent.listen/think/speak). [Voice agent settings]

#
  1. Text vs binary frames

If you paste your C++ snippet (or at least the connect call + headers + first send), I can line it up against these documented requirements and call out the exact mismatch.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@turbid sequoia Kapa.ai is still learning and improving, please let me know how I did by reacting below.

#

Let me check the knowledge base for you again. One moment! :books:

#

The most likely issue is how you’re using the WebSocket subprotocol for auth.

Deepgram expects the Sec-WebSocket-Protocol header to contain two separate subprotocol values:

Sec-WebSocket-Protocol: token, YOUR_DEEPGRAM_API_KEY

i.e., one subprotocol is token, the other is the API key. [Sec-WebSocket-Protocol; TTS example]

In your code:

authorizationHeader = String("token, ") + DEEPGRAM_API_KEY;
websocket_cfg.subprotocol = authorizationHeader.c_str();

you’re passing a single string "token, <API_KEY>" as the subprotocol. Whether this works depends entirely on how esp_websocket_client serializes subprotocol:

  • If it sends exactly Sec-WebSocket-Protocol: token, <API_KEY>, you’re fine.
  • If it treats subprotocol as a single token and does not insert the comma/space correctly (or escapes it), the server will see one invalid subprotocol and reject auth, which can manifest as a 404/failed upgrade.

Deepgram’s own examples that use subprotocols (e.g., Python websockets.connect(..., subprotocols=["token", api_key])) rely on the library to send two values in the header, not one concatenated string. [Flutter discussion; Twilio agent example]

Other things to double‑check against the docs:

  1. Endpoint URL
    For Voice Agent v1, the WebSocket URL must be:

    wss://agent.deepgram.com/v1/agent/converse
    

    The reference explicitly documents this path. [Voice agent ref]

#

So VOICE_AGENT_WS_HOST should be agent.deepgram.com and VOICE_AGENT_WS_PATH should be /v1/agent/converse. Any other path can legitimately return 404.

  1. Alternative: Authorization header
    Voice Agent also supports standard auth via header:

    Authorization: token YOUR_DEEPGRAM_API_KEY
    

    or

    Authorization: bearer YOUR_DEEPGRAM_TOKEN
    ``` [[Voice agent ref](<https://developers.deepgram.com/reference/voice-agent/voice-agent>)]
    
    If `esp_websocket_client` lets you set custom headers, this is often simpler and avoids subprotocol quirks.
    
    
  2. Handshake success vs 404
    If the WebSocket upgrade fails, you’ll get an HTTP error (4xx/5xx). Deepgram includes dg-error and dg-request-id headers on failed upgrades, which are useful to confirm whether it’s a path (404) or auth problem. [STT WS errors; TTS WS errors]

Because the knowledge sources don’t document esp_websocket_client specifically, I can’t say with certainty how it formats subprotocol. If it doesn’t support multiple subprotocols natively, you’ll likely need to:

  • Prefer an Authorization header if possible, or
  • Ensure it literally sends Sec-WebSocket-Protocol: token, YOUR_DEEPGRAM_API_KEY as shown in the Deepgram examples.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@turbid sequoia Kapa.ai is still learning and improving, please let me know how I did by reacting below.

shadow phoenixBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

You can use either a regular API key or a token; you do not have to use token-based auth.

For Voice Agent, the WebSocket handshake explicitly supports:

  • Authorization: token YOUR_DEEPGRAM_API_KEY
  • or Authorization: bearer YOUR_DEEPGRAM_TOKEN (a short‑lived JWT from the Token-based Auth API) [Voice agent].

Token-based auth is recommended when you need short‑lived, delegated access (e.g., from untrusted clients), and you can create those tokens via POST /v1/auth/grant. [Token grant; Auth overview]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: