Contradiction in the OpenAI Realtime API Cookbook | OpenAI | Page 1

unique daggerBOT Apr 21, 2026, 3:04 AM

#

Hi @oblique mango. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

You’re right to notice the mismatch, and your interpretation is correct.

From the Realtime “Beta to GA migration” guide, the event names were explicitly changed when the Realtime API went GA:

response.text.delta → response.output_text.delta
response.audio.delta → response.output_audio.delta
response.audio_transcript.delta → response.output_audio_transcript.delta [Beta→GA migration]

The current GA docs for audio over WebSockets also consistently use response.output_audio.delta and response.output_audio.done as the correct server events. [Handling audio with WebSockets; Audio output from WebSocket]

However, the context‑summarization cookbook still shows response.audio.delta in its event table and example handler, which matches the old (beta) naming. [Context summarization example]

So:

The API reference is up to date for GA.
The cookbook example is partially outdated in its event names and should be updated to use response.output_audio.delta (and similarly for other output_* events).

If you’re building against the GA Realtime API (without the `OpenAI-Beta: realtime=v1` header), you should follow the GA event names from the API reference, not the older ones shown in that cookbook.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 21, 2026, 3:36 AM

#

is there a way to retrieve the session's record of a conversation history to check if it matches my internal records? @unique dagger

unique daggerBOT Apr 21, 2026, 3:36 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

Yes. The Session protocol in openai-agents-python defines a get_items(limit: int | None = None) method that returns the conversation history for that session as a list of input items. [Session protocol]

Concrete session implementations (e.g., SQLiteSession, RedisSession, SQLAlchemySession, DaprSession) all provide this same async method to retrieve history:

items = await session.get_items()          # full history
items = await session.get_items(limit=50)  # latest 50 items

SQLiteSession.get_items reads stored message_data rows for the session_id and returns them in chronological order. [SQLiteSession.get_items]
RedisSession.get_items pulls the list from Redis and returns the messages, again in chronological order, with an optional limit. [RedisSession.get_items]
SQLAlchemySession.get_items and DaprSession.get_items behave similarly, querying their respective backends and returning a list of deserialized items. [SQLAlchemySession.get_items; DaprSession.get_items]

#

You can compare the returned list of items to your internal records to verify they match.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 21, 2026, 3:37 AM

#

is there a way to retrieve the session's record of a conversation history to check if it matches my internal records? I am using the realtime api with a websocket. in a voice conversation. @unique dagger

unique daggerBOT Apr 21, 2026, 3:37 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

There is some support for this, but the information in the knowledge sources is limited.

From the openai-agents-python side, a session is explicitly defined as something that “stores conversation history for a specific session,” and you can retrieve that history via the Session.get_items(limit: int | None = None) method, which returns a list of items representing the conversation history. [Session protocol] This is how you’d programmatically read back what the agent believes the session history is and compare it to your own records.

For Realtime specifically, there is also a RealtimeHistoryUpdated event whose history field is documented as:

history: list[RealtimeItem] – The full history of the session. [Realtime history]

So, in a Realtime WebSocket voice conversation, the documented options are:

Listen for RealtimeHistoryUpdated events and inspect the history list to see the full session history as the Realtime agent tracks it, then compare that to your internal log. [Realtime history]
If you are using an openai-agents-python Session backend, call await session.get_items() (optionally with a limit) to retrieve the stored conversation history and compare it to your own records. [Session protocol]

#

The knowledge sources do not show any other Realtime-API-specific “fetch history” command beyond these patterns, so if you need more direct or lower-level history retrieval, it isn’t documented in the provided material.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 21, 2026, 3:47 AM

#

unique dagger ---------------------- There is some support for this, but the information in th...

The docs say that using a websocket to interact with the realtime api allows for more low level control than creating a RealtimeSession. what does the websocket give you control over that you simply cannot do with the RealtimeSession @unique dagger

unique daggerBOT Apr 21, 2026, 3:47 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

The knowledge sources only partially answer this, and they don’t list a complete “cannot do with RealtimeSession” matrix. I’ll stick to what’s explicitly documented.

From the docs, WebSocket access is the lowest‑level interface to the Realtime API: you send and receive raw JSON events yourself. [Sending & receiving events] In contrast, RealtimeSession in openai-agents-python is a higher‑level wrapper that:

Hides the WebSocket connection (OpenAIRealtimeWebSocketModel) behind session.model
Provides convenience methods like send_message() and send_audio() that automatically manage some Realtime events for you [Inputs and outputs]

The docs explicitly call out what you use the underlying transport (WebSocket model) for that goes beyond the high‑level session:

You can access the underlying transport object through session.model.
Use this when you need:

custom listeners via session.model.add_listener(...)

raw client events such as response.create or session.update

custom url, headers, or api_key handling through model_config

call_id attach to an existing realtime call [Low-level access]

Concretely, the extra control you get at the WebSocket / model level (vs just using the high‑level RealtimeSession helpers) includes:

#

Sending arbitrary raw client events
- You can send your own response.create, session.update, input_audio_buffer.commit, etc., by constructing and sending raw messages over the socket (or via session.model.send_event(RealtimeModelSendRawMessage(...))). [Inputs and outputs; Low-level access]
- This is required for fully manual turn control (e.g., disabling turn_detection and deciding exactly when to trigger response.create). [Inputs and outputs]
Custom connection parameters and endpoints
- You can specify a custom url, headers, and api_key in model_config when creating the session, which are passed through to the underlying WebSocket connection. This is how you, for example, connect to Azure OpenAI Realtime endpoints or use token‑based auth. [Low-level access; OpenAIRealtimeWebSocketModel.connect]
Attaching to an existing realtime call via call_id
- The model transport supports call_id in its config to attach to an existing Realtime call (e.g., SIP example). [Low-level access; OpenAIRealtimeWebSocketModel.connect]

#

Custom event listeners at the transport level
- You can register listeners directly on the model (session.model.add_listener(...)) to receive low‑level events such as audio deltas, tool calls, and internal error events. [Low-level access; OpenAIRealtimeWebSocketModel.add_listener]
Direct handling of raw WebSocket messages
- At the pure WebSocket level (without the Python agents wrapper), you are responsible for sending JSON events and parsing all server events yourself, including Base64‑encoded audio chunks (response.output_audio.delta). [Sending & receiving events; Audio output over WebSocket]

What’s not clearly documented is a list of things that are impossible with RealtimeSession but possible with raw WebSockets; instead, the docs show that RealtimeSession exposes the underlying WebSocket model when you need this lower‑level control. So in practice:

Use RealtimeSession for convenience (send_message, send_audio, automatic response handling).
Drop down to the WebSocket / model layer when you need raw events, custom connection behavior, or to attach to existing calls.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 21, 2026, 4:57 AM

#

when would you want to use a RealtimeRunner vs a RealtimeSession @unique dagger

unique daggerBOT Apr 21, 2026, 4:57 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

RealtimeRunner and RealtimeSession are used together, but at different stages:

RealtimeRunner is a factory for realtime sessions.
- You create it once with a starting RealtimeAgent and optional run‑level config (guardrails, async tool calls, etc.). [RealtimeRunner; Realtime guide]
- It wires that agent to a transport (by default OpenAIRealtimeWebSocketModel) and prepares the overall run shape. [RealtimeRunner; Realtime transport]
RealtimeSession is the live connection you actually talk to.
- You get it by calling await runner.run(...). [RealtimeRunner.run; Session lifecycle]
- You then:
  - enter it (async with await runner.run() as session:),
  - send messages/audio (send_message, send_audio),
  - and iterate its events until the conversation ends. [RealtimeSession; Session lifecycle]

So in practice:

#

Use RealtimeRunner when you’re setting up a realtime agent run: choosing the starting agent, transport, and run‑wide config.
Use RealtimeSession once the run has started, to conduct the actual conversation: streaming inputs/outputs, handling events, tools, and history.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 21, 2026, 4:58 AM

#

how do you access the chat history of a RealtimeSession vs a RealtimeRunner @unique dagger

unique daggerBOT Apr 21, 2026, 4:58 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

For RealtimeSession, the history is exposed directly via events; for RealtimeRunner, there’s no separate history API in the provided docs.

RealtimeSession

When a RealtimeSession starts (__aenter__), it immediately emits a RealtimeHistoryUpdated event whose history field is:

history: list[RealtimeItem] – The full history of the session. [RealtimeSession ref; HistoryUpdated.history]

The Realtime agents guide also calls out:

High‑value session events include history_added, history_updated … They expose the session’s local history as RealtimeItem objects. [Events, history, interruptions]

So to access chat history with a RealtimeSession, you:

Iterate session events and watch for:
- history_updated → full history (list[RealtimeItem])
- history_added → newly added item(s) [Events, history, interruptions]

RealtimeRunner

The knowledge sources only describe RealtimeRunner as a session factory that you call to obtain a RealtimeSession; they do not document any direct history access on the runner itself. [Realtime agents guide]

So, based on the available docs:

#

You do not access chat history from RealtimeRunner directly.
You do access it from the RealtimeSession created by the runner, via history_updated / history_added events and their history / item fields.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 21, 2026, 5:04 AM

#

can you add a message to the RealtimeSession history without it appearing as the most recent message (so that the agent does not directly reply to it as though the user just said that thing) and also without triggering a response? If i wanted to insert some information into the chat history for the model to use in its next responses, but i didnt want the model to reply as though the user just said that, how would i do that? @unique dagger

unique daggerBOT Apr 21, 2026, 5:04 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

The knowledge sources don’t describe a way to back‑insert items into a Realtime session’s history (e.g., at an earlier position) or to mark a history item as “not the latest turn.” They also don’t show a dedicated “silent history write” event for Realtime sessions.

What is documented are two related patterns:

Control what context is used for a response, without changing the default conversation

With the low‑level Realtime API you can create an out‑of-band response that:

does not get added to the default conversation (conversation: "none"), and
uses a custom input array that can include arbitrary items and/or references to existing items. [Create custom context; Out-of-band responses]

Example (Python-style from the docs):

event = {
    "type": "response.create",
    "response": {
        "conversation": "none",
        "metadata": { "topic": "pizza" },
        "output_modalities": ["text"],
        "input": [
            # reference existing items
            { "type": "item_reference", "id": "some_conversation_item_id" },
            # add new context that is *not* treated as a fresh user turn
            {
                "type": "message",
                "role": "system",
                "content": [
                    { "type": "input_text", "text": "Extra background info..." }
                ],
            },
        ],
    },
}

#

This lets you inject information into the context used for the next response without:

modifying the default conversation history, and
making it look like the user just said that. [Create custom context]

In the agents SDK, control how history + new input are merged

In openai-agents-python, when you use a Session (not specifically Realtime), the runner normally sends:
1. session history (session.get_items(...))
2. new turn input
You can customize this merge with RunConfig.session_input_callback(history, new_input), which returns the final list of items sent to the model. [Control history merge]

That callback:
- does not change what’s stored in the session; it only changes what the model sees for that call.
- lets you add extra items into the context (e.g., prepend a synthetic system message) without those items being saved as “the latest user turn.”
Example from the docs (simplified):
```
def keep_recent_history(history, new_input):
    # You could also insert extra context here
    return history[-10:] + new_input
```
Only the new turn’s items are persisted; your injected items can be used for context without becoming the latest user message. [Control history merge]

Given the docs we have:

#

There is no documented way to insert a message into a RealtimeSession’s stored history at an earlier position or as a “hidden” item that doesn’t look like the latest turn.
The documented pattern to “add information the model can use next, without it being treated as a fresh user utterance and without auto‑triggering a response” is:
- either use response.create with conversation: "none" and a custom input array that includes your extra info as a system message alongside references to existing items; [Create custom context]
- or, in the agents SDK, use session_input_callback to inject extra context into the model input for that turn, without persisting it as the latest user message. [Control history merge]

If you specifically need “write into the stored RealtimeSession history but not as the most recent item,” the knowledge sources do not describe a supported mechanism for that.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 22, 2026, 1:59 AM

#

@unique dagger what is the more low level alternative to RealtimeSession.send_message() What if I want to add a message to the chat history but I dont want to prompt a response or interrupt the model talking.

unique daggerBOT Apr 22, 2026, 1:59 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

The more low-level alternative is to send a raw client event through the underlying model transport instead of using RealtimeSession.send_message().

In the Realtime agents guide, this is shown using RealtimeModelSendRawMessage via session.model.send_event(...) to send a response.create manually. [Inputs and outputs]

from agents.realtime.model_inputs import RealtimeModelSendRawMessage

await session.model.send_event(
    RealtimeModelSendRawMessage(
        message={
            "type": "response.create",
        }
    )
)

You can use the same pattern to send any raw Realtime client event, not just response.create. That’s the low-level alternative to send_message().

However, the knowledge sources do not document a specific raw event that:

Adds a message to the session’s stored conversation history,
Does not become the “latest user turn” for response triggering, and
Does not trigger or interrupt a response.

What is documented instead is how to add context without changing the default conversation or prompting a normal turn:

Use a response.create with a custom input array and conversation: "none" to build a one-off context (including your extra message as a system or assistant message) that the model uses for that response only, without adding it to the default conversation. [Create custom context]

So, based on the docs:

#

Low-level alternative to send_message(): send raw client events via session.model.send_event(RealtimeModelSendRawMessage(...)). [Inputs and outputs]
Adding info for the next response without treating it as a fresh user utterance: use response.create with a custom input array (and optionally conversation: "none") rather than trying to silently write into the stored session history. [Create custom context]

The sources do not show a way to “silently append to stored history” in the exact way you describe.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 22, 2026, 2:12 AM

#

@unique dagger

event = {
    "type": "response.create",
    "response": {
        "conversation": "none",
        "metadata": { "topic": "pizza" },
        "output_modalities": [ "text" ],

        # Create a custom input array for this request with whatever
        # context is appropriate
        "input": [
            # potentially include existing conversation items:
            {
                "type": "item_reference",
                "id": "some_conversation_item_id"
            },

            # include new content as well
            {
                "type": "message",
                "role": "user",
                "content": [
                    {
                        "type": "input_text",
                        "text": "Is it okay to put pineapple on pizza?",
                    }
                ],
            }
        ],
    },
}

ws.send(json.dumps(event))```

Is it possible to run some function like this with custom text / information in the context, using `session.model.send_event(RealtimeModelSendRawMessage(...))` rather than sending a dict to the websocket specifically?

unique daggerBOT Apr 22, 2026, 2:12 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

Yes. The Realtime agents guide explicitly shows using session.model.send_event(RealtimeModelSendRawMessage(...)) to send raw client events like response.create, instead of writing JSON to the WebSocket yourself. [Inputs and outputs]

The guide’s example (for manual response control) is:

from agents.realtime.model_inputs import RealtimeModelSendRawMessage

await session.model.send_event(
    RealtimeModelSendRawMessage(
        message={
            "type": "response.create",
        }
    )
)

You can adapt this pattern to send your custom response.create with a custom input array (like in your pizza example). Conceptually it would look like:

from agents.realtime.model_inputs import RealtimeModelSendRawMessage

await session.model.send_event(
    RealtimeModelSendRawMessage(
        message={
            "type": "response.create",
            "response": {
                "conversation": "none",
                "metadata": {"topic": "pizza"},
                "output_modalities": ["text"],
                "input": [
                    {"type": "item_reference", "id": "some_conversation_item_id"},
                    {
                        "type": "message",
                        "role": "user",
                        "content": [
                            {
                                "type": "input_text",
                                "text": "Is it okay to put pineapple on pizza?",
                            }
                        ],
                    },
                ],
            },
        },
    )
)

#

The docs don’t show this exact payload, but they do state that `RealtimeModelSendRawMessage` is for sending “raw client events” and demonstrate it with `response.create`, so using it with the richer `response` body from the Realtime API examples is consistent with the documented pattern. [Inputs and outputs; Create custom context]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 22, 2026, 2:23 AM

#

unique dagger ---------------------- Yes. The Realtime agents guide explicitly shows using `se...

@unique dagger How would you add context permanently to a conversation without triggering a response

unique daggerBOT Apr 22, 2026, 2:23 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

The knowledge sources don’t describe a way to permanently write extra context into a Realtime conversation’s stored history without it being treated as a new turn and/or potentially triggering a response. They instead document patterns for adding ephemeral context to a single response.

Here’s what is documented:

Custom context for a single response (not permanent)
You can send a response.create with a custom input array (including new messages and references to existing items) and even set conversation: "none" so it doesn’t touch the default conversation. This lets you inject extra text the model can use for that response only, without changing the stored conversation. [Create custom context]
Custom merge of history + new input (agents SDK, not Realtime history writes)
With the higher‑level agents Session, you can use RunConfig.session_input_callback(history, new_input) to modify what the model sees each turn (e.g., prepend a synthetic system message) without changing what’s actually stored in the session. [Control history merge]

Neither of these is a “permanent, silent insert into RealtimeSession history.” They both adjust what context is sent to the model, not the underlying stored history.

So, based on the provided docs:

#

You can add context for a response (via response.create with custom input, or via session_input_callback in the agents SDK). [Create custom context; Control history merge]
There is no documented mechanism to permanently add context into a Realtime conversation’s stored history without it being treated as a normal turn that could influence response triggering.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 22, 2026, 8:36 AM

#

@unique dagger how do you send a sessionupdate event to a realtimeSession so you can update it's prompt / information

unique daggerBOT Apr 22, 2026, 8:36 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

You update a RealtimeSession’s prompt/instructions by sending a session.update client event through its underlying model transport.

There are two closely related cases:

1. Raw Realtime API (WebSocket)

Send a session.update event over the WebSocket with the new prompt / instructions:

event = {
    "type": "session.update",
    "session": {
        "type": "realtime",
        "model": "gpt-realtime",
        "prompt": {
            "id": "pmpt_123",
            "version": "89",
            "variables": {
                "city": "Paris",
            },
        },
        # Direct fields override prompt fields if they overlap
        "instructions": "Speak clearly and briefly. Confirm understanding before taking actions.",
    },
}

ws.send(json.dumps(event))

This is exactly the pattern shown in the Realtime docs for updating a session’s prompt and instructions mid-call. [Session lifecycle; Prompt update]

You can also send a smaller update, e.g. just changing variables or instructions:

dc.send(JSON.stringify({
  type: "session.update",
  session: {
    type: "realtime",
    prompt: {
      id: "pmpt_123",
      version: "89",
      variables: { city: "Berlin" },
    },
  },
}));

[Changing prompt mid-call]

2. Using `RealtimeSession` from `openai-agents-python`

#

RealtimeSession wraps a transport (by default OpenAIRealtimeWebSocketModel). To send a session.update from Python, you use the model’s send_event with a RealtimeModelSendSessionUpdate event, which internally sends a session.update client event. This is how RealtimeSession.update_agent works:

async def update_agent(self, agent: RealtimeAgent) -> None:
    """Update the active agent for this session and apply its settings to the model."""
    self._current_agent = agent

    updated_settings = await self._get_updated_model_settings_from_agent(
        starting_settings=None,
        agent=self._current_agent,
    )

    await self._model.send_event(
        RealtimeModelSendSessionUpdate(session_settings=updated_settings)
    )

[RealtimeSession.update_agent]

So in your own code you:

Build new session settings (including updated instructions, prompt, etc.).
Call await session._model.send_event(RealtimeModelSendSessionUpdate(...))
or, more typically, use a helper like session.update_agent(...) that does this for you. [RealtimeSession.update_agent]

Summary

#

Over a raw WebSocket: send a JSON event with type: "session.update" and your new session fields. [Session lifecycle]
With RealtimeSession (agents SDK): call update_agent(...) or directly send a RealtimeModelSendSessionUpdate via session._model.send_event(...), which emits the session.update event under the hood. [RealtimeSession.update_agent]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 22, 2026, 8:38 AM

#

@unique dagger how do you send a sessionupdate event to a realtimeSession so you can update it's prompt / information. Assuming i only have a reference to the RealtimeSession variable that i'm using. i dont have a raw websocket variable currently

unique daggerBOT Apr 22, 2026, 8:38 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

You can do this directly through the RealtimeSession you already have; you don’t need a raw WebSocket handle.

In openai-agents-python, RealtimeSession updates the model’s session by sending a RealtimeModelSendSessionUpdate event via its internal model (self._model). The built‑in update_agent method shows the pattern:

async def update_agent(self, agent: RealtimeAgent) -> None:
    """Update the active agent for this session and apply its settings to the model."""
    self._current_agent = agent

    updated_settings = await self._get_updated_model_settings_from_agent(
        starting_settings=None,
        agent=self._current_agent,
    )

    await self._model.send_event(
        RealtimeModelSendSessionUpdate(session_settings=updated_settings)
    )

[RealtimeSession.update_agent]

So, given only a session: RealtimeSession:

Build the new session settings dict (e.g., updated instructions, prompt, etc.).
Send a session update event via the session’s model:

from agents.realtime.model_inputs import RealtimeModelSendSessionUpdate

updated_settings = {
    # whatever fields you want to change, e.g.:
    "instructions": "New instructions here",
    # "prompt": {...}, "output_modalities": [...], etc.
}

await session._model.send_event(
    RealtimeModelSendSessionUpdate(session_settings=updated_settings)
)

This sends the equivalent of a `session.update` client event to the Realtime API, updating the prompt / information for that live `RealtimeSession`.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 22, 2026, 8:43 AM

#

@unique dagger under what circumstances is the agents.realtime.events.RealtimeAgentEndEvent sent

unique daggerBOT Apr 22, 2026, 8:43 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

agents.realtime.events.RealtimeAgentEndEvent is emitted by RealtimeSession when the underlying model signals that a turn has finished.

In the RealtimeSession event loop, when it receives a model event with event.type == "turn_ended":

It clears some per‑turn guardrail state.
Then it emits a RealtimeAgentEndEvent with the current agent and common event info. [RealtimeSession]

elif event.type == "turn_ended":
    # Clear guardrail state for next turn
    self._item_transcripts.clear()
    self._item_guardrail_run_counts.clear()

    await self._put_event(
        RealtimeAgentEndEvent(
            agent=self._current_agent,
            info=self._event_info,
        )
    )

The event class itself is documented as:

“An agent has ended.”
agent: The agent that ended.
info: Common info for all events, such as the context. [Realtime events]

So, whenever the model finishes generating for a turn (`turn_ended`), the session emits `RealtimeAgentEndEvent` for that agent.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 22, 2026, 10:40 AM

#

@unique dagger when I run this code:

from agents.realtime.model_inputs import RealtimeModelSendSessionUpdate

updated_settings = {
    # whatever fields you want to change, e.g.:
    "instructions": "New instructions here",
    # "prompt": {...}, "output_modalities": [...], etc.
}

await session._model.send_event(
    RealtimeModelSendSessionUpdate(session_settings=updated_settings)
)```
I get this error:  Cannot update a conversation's voice if assistant audio is present. How do i make it clear that i'm only trying to update the instructions and not the voice.

unique daggerBOT Apr 22, 2026, 10:40 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

The error is coming from the Realtime API itself: once a session has produced assistant audio, you cannot change the voice for that conversation anymore. [Session lifecycle]

What’s happening in your case is:

RealtimeModelSendSessionUpdate(session_settings=updated_settings) is being built from a full RealtimeSessionModelSettings object.
That object likely still contains the original voice (or nested audio.output.voice) from when the session was created.
When you send the update, the server sees a voice field in the payload and rejects it with: “Cannot update a conversation's voice if assistant audio is present.”

To “only update instructions,” you must ensure the session update payload does not include any voice fields.

In the agents SDK, the recommended way to configure a realtime session is via model_settings (including audio.output.voice) when you create the runner, and then use session.update_agent(...) / session._model.send_event(...) with a minimal settings dict that only has the fields you want to change. [Agent and session configuration; Model settings]

Concretely, instead of reusing a full settings object, build a narrow one:

from agents.realtime.model_inputs import RealtimeModelSendSessionUpdate

# Only include fields you actually want to change
updated_settings = {
    "instructions": "New instructions here",
    # do NOT include "voice" or "audio.output.voice" here
}

await session._model.send_event(
    RealtimeModelSendSessionUpdate(session_settings=updated_settings)
)

#

The docs don’t show a full example of stripping voice, but they do state:

Most session properties can be updated at any time, except for the voice the model uses for audio output, after the model has responded with audio once during the session. [Session lifecycle]

So the fix is to:

Make sure updated_settings only contains updatable fields (e.g., instructions, prompt, tools, etc.).
Avoid passing any voice / audio.output.voice in that dict once audio has been produced.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 23, 2026, 5:29 AM

#

@unique dagger If i want to update the instructions used by a RealtimeSession's agent. should i copy it's current agent, edit the instructions, then use update_agent with the newly updated agent?

unique daggerBOT Apr 23, 2026, 5:29 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, that’s the pattern the SDK is designed for.

Clone the current agent with new instructions
Both Agent and RealtimeAgent have a clone(**kwargs) helper that makes a shallow copy with selected fields changed, e.g.:
```
new_agent = session.current_agent.clone(instructions="New instructions")
```
This is exactly how the docs show updating instructions: new_agent = agent.clone(instructions="New instructions"). [Agent.clone; RealtimeAgent.clone]
Apply it to the live session with update_agent
RealtimeSession.update_agent(agent) sets the active agent and sends a session.update to apply its settings to the model:
```
await session.update_agent(new_agent)
```
Internally, this computes updated model settings from the agent and sends a RealtimeModelSendSessionUpdate event. [RealtimeSession.update_agent]

So: yes—clone the existing `RealtimeAgent` with new `instructions`, then call `session.update_agent(new_agent)` to update the instructions for that RealtimeSession.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 23, 2026, 5:35 AM

#

@unique dagger what's the difference between an agent's instructions and it's prompt

unique daggerBOT Apr 23, 2026, 5:35 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

In the openai-agents-python SDK, instructions and prompt are two different ways to control an agent, with different roles and mechanisms.

`instructions`

A field directly on Agent / RealtimeAgent:

instructions: (
    str
    | Callable[[RunContextWrapper[TContext], Agent[TContext]], MaybeAwaitable[str]]
    | None
) = None

It is:

The instructions for the agent. Will be used as the "system prompt" when this agent is invoked. Describes what the agent should do, and how it responds. [Agent instructions; RealtimeAgent instructions]

Can be:
- a string (static system prompt), or
- a function that dynamically returns a string based on the run context and the agent. [Agent instructions]
For RealtimeAgent, get_system_prompt resolves instructions into the actual system prompt string sent to the model. [RealtimeAgent]

So: instructions = the agent’s system prompt text, defined in code.

`prompt`

A separate field on Agent / RealtimeAgent:

prompt: Prompt | DynamicPromptFunction | None = None

For Agent:

A prompt object (or a function that returns a Prompt). Prompts allow you to dynamically configure the instructions, tools and other config for an agent outside of your code. Only usable with OpenAI models, using the Responses API. [Agent prompt]

#

For RealtimeAgent:

A prompt object. Prompts allow you to dynamically configure the instructions, tools and other config for an agent outside of your code. Only usable with OpenAI models. [RealtimeAgent prompt]

Prompt objects are a higher‑level configuration mechanism:
- live outside your code (e.g., managed prompt templates),
- can set instructions, tools, and other config together.

So: prompt = a structured prompt configuration object (or provider) that can define instructions and more, often managed outside code.

How they relate

instructions is the direct system prompt string (or generator) on the agent.
prompt is a Prompt object that can supply instructions (and other settings) dynamically.
When both are used, the SDK can combine or override based on the Prompt definition; the docs don’t spell out precedence in detail, only that prompts “dynamically configure the instructions, tools and other config.” [Agent prompt]

In short:

Use instructions for straightforward, code-defined system behavior.
Use prompt when you want a reusable, externally managed prompt configuration that can also control tools and other settings, not just the raw instruction text.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 26, 2026, 3:16 AM

#

@unique dagger how do you specify the language that a realtime voice agent should speak using. also how do you specify what langauge should be assumed for the user audio transcription purposes

unique daggerBOT Apr 26, 2026, 3:16 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

There are two separate settings for this, and they work differently:

1. Language the agent speaks (output voice language)

The knowledge sources do not document a dedicated "output language" setting on RealtimeSessionModelSettings or RealtimeAgent. The voice itself (e.g., "ash") is set via audio.output.voice, but there is no explicit language selector for the output audio shown in the provided docs. [Agent and session configuration]

The typical way to control what language the agent responds in is through the agent's instructions — e.g., "Always respond in French." — rather than a dedicated language field.

2. Language assumed for user audio transcription (input)

This is explicitly configurable. The RealtimeInputAudioTranscriptionConfig TypedDict has a language field:

class RealtimeInputAudioTranscriptionConfig(TypedDict):
    language: NotRequired[str]   # The language code for transcription
    model: NotRequired[Literal["gpt-4o-transcribe", "gpt-4o-mini-transcribe", "whisper-1"] | str]
    prompt: NotRequired[str]     # An optional prompt to guide transcription

[Audio configuration]

You set it in the runner config under audio.input.transcription:

#

runner = RealtimeRunner(
    starting_agent=agent,
    config={
        "model_settings": {
            "model_name": "gpt-realtime-1.5",
            "audio": {
                "input": {
                    "format": "pcm16",
                    "transcription": {
                        "model": "gpt-4o-mini-transcribe",
                        "language": "fr",  # e.g. BCP-47 language code
                    },
                },
                "output": {
                    "format": "pcm16",
                    "voice": "ash",
                },
            },
        }
    },
)

[Key settings]

The `language` value should be a language code (e.g., `"fr"` for French, `"es"` for Spanish). The docs don't specify the exact format, but BCP-47 codes are standard for Whisper-based transcription.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 26, 2026, 3:18 AM

#

@unique dagger how do you tune the turn taking settings? i've found that in my conversations i'm recieving many transcriptions from user audio that are single words, usually when the user is speaking slowly. how do i specify the duration of silence before the user's audio is transcribed as a distinct message in the chat history

unique daggerBOT Apr 26, 2026, 3:18 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

The issue you're describing — audio being chunked too eagerly into short fragments — is controlled by the turn detection (VAD) settings, specifically silence_duration_ms and the choice of VAD mode.

Option 1: Increase `silence_duration_ms` (Server VAD)

With the default server_vad mode, the key parameter is:

silence_duration_ms: Duration of silence (in ms) to detect speech stop. Defaults to 500ms. With shorter values the model responds more quickly but may jump in on short pauses. [Server VAD]

Increasing this means the model waits longer after the user stops speaking before treating it as a completed turn. You can set it via session.update:

{
    "type": "session.update",
    "session": {
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 1000,  # increase from default 500ms
            "create_response": True,
            "interrupt_response": True,
        }
    }
}

Or via RealtimeModelSendSessionUpdate in the agents SDK:

from agents.realtime.model_inputs import RealtimeModelSendSessionUpdate

await session._model.send_event(
    RealtimeModelSendSessionUpdate(session_settings={
        "turn_detection": {
            "type": "server_vad",
            "silence_duration_ms": 1000,
        }
    })
)

Option 2: Switch to Semantic VAD

Semantic VAD is specifically designed for this problem — it uses a model to estimate whether the user has actually finished speaking, rather than just detecting silence. For example, if the user trails off with "uhhm…", it scores a low probability of turn end and waits longer. [Semantic VAD]

#

You can control its eagerness with the eagerness parameter:

low — waits longer (max timeout 8s), better for slow speakers
medium / auto — default (max timeout 4s)
high — responds quickly (max timeout 2s)

{
    "type": "session.update",
    "session": {
        "turn_detection": {
            "type": "semantic_vad",
            "eagerness": "low",  # let the user take their time
            "create_response": True,
            "interrupt_response": True,
        }
    }
}

[Semantic VAD]

Summary: For slow speakers getting fragmented into single-word transcriptions, the most targeted fix is either:

Increase silence_duration_ms in server_vad, or
Switch to semantic_vad with eagerness: "low", which is purpose-built for this scenario.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 26, 2026, 5:46 AM

#

@unique dagger

        "tools": [
          {
            "name": "get_random_fact",
            "description": "Get a random fact to tell the user. \nUse this function whenever the user asks or if the conversation just needs a little spicing up.\n\nArgs: The current topic of conversation",
            "params_json_schema": {
              "properties": {
                "current_conversation_topic": {
                  "title": "Current Conversation Topic",
                  "type": "string"
                }
              },
              "required": [
                "current_conversation_topic"
              ],
              "title": "get_random_fact_args",
              "type": "object",
              "additionalProperties": false
            },
            "on_invoke_tool": {},
            "strict_json_schema": true,
            "is_enabled": true,
            "tool_input_guardrails": null,
            "tool_output_guardrails": null,
            "needs_approval": false,
            "timeout_seconds": null,
            "timeout_behavior": "error_as_result",
            "timeout_error_function": null,
            "defer_loading": false,
            "_failure_error_function": null,
            "_use_default_failure_error_function": true,
            "_is_agent_tool": false,
            "_is_codex_tool": false,
            "_agent_instance": null,
            "_tool_namespace": null,
            "_tool_namespace_description": null,
            "_mcp_title": null,
            "_tool_origin": null,
            "_emit_tool_origin": true
          }

how do i tweak these tool parameters to ensure that the model waits for the fun fact function to finish before it continues the conversation and tells the fun fact to the user.

This is how the function is defined

unique daggerBOT Apr 26, 2026, 5:46 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

The tool parameters you've shown are already set up correctly for synchronous execution — the key thing here isn't the tool's JSON schema parameters, but rather how the tool function itself is defined and whether it's async.

From the function_tool source code, the SDK handles sync vs async functions differently:

If your function is async (async def), it is awaited directly.
If your function is sync (def), it is run in a thread via asyncio.to_thread(...).

In both cases, the SDK waits for the function to complete before returning the result to the model. [function_tool ref]

So the model will not continue until the tool returns — that behaviour is built in.

If the model is continuing before the tool finishes

The most likely cause is a timeout. Your tool config shows:

"timeout_seconds": null,
"timeout_behavior": "error_as_result"

timeout_seconds: null means no timeout is enforced, so the SDK will wait indefinitely. That's fine.

However, if you want to add a safety timeout (e.g., 10 seconds), you can set it when defining the tool:

@function_tool(timeout=10, timeout_behavior="error_as_result")
async def get_random_fact(current_conversation_topic: str) -> str:
    ...

With timeout_behavior="error_as_result", if the function times out, a model-visible error message is returned instead of raising an exception, so the model can still respond gracefully. [function_tool ref]

Summary

#

The SDK already waits for the tool to finish before passing the result back to the model — no parameter changes are needed for that.
If the model seems to be skipping the result, check whether your function is raising an exception silently (controlled by failure_error_function) or timing out.
The timeout_seconds and timeout_behavior fields in your tool config are the relevant knobs if you want to control what happens when the function takes too long. [FunctionTool ref]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 28, 2026, 1:22 AM

#

@unique dagger during a realtime voice agent session with tool use. what are the events that I should use to maintain a record of what was said, in sequence, as well as what tools were used, and when the tools completed, and with what return values.

unique daggerBOT Apr 28, 2026, 1:22 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

Based on the knowledge sources, here are the key events to listen for on a RealtimeSession to maintain a full sequential record:

Conversation Content

history_added — emitted when a new RealtimeItem is appended to the session's local history (e.g., a new user message after transcription completes, or a new assistant message). [Events, history, interruptions]
history_updated — emitted when an existing item in the history is updated (e.g., a message item gets its content filled in). [Events, history, interruptions]

These two events expose the session's local history as RealtimeItem objects, including user messages and assistant messages.

For user speech specifically, the underlying model event input_audio_transcription_completed (type RealtimeModelInputAudioTranscriptionCompletedEvent) carries the item_id and transcript — the session uses this to update history and then emits history_updated or history_added. [Model Events]

Tool Use

tool_start (RealtimeToolStart) — emitted when the agent begins a tool call. Contains:
- agent — the agent making the call
- tool — the tool being called
- arguments — the arguments as a JSON string [Tool Events]

#

tool_end (RealtimeToolEnd) — emitted after the tool function completes and its output has been sent back to the model. Contains:
- tool — the tool that was called
- output — the return value of the tool
- agent — the agent that called it
- arguments — the arguments that were passed [RealtimeSession]

Turn Boundaries

agent_start / agent_end — mark the beginning and end of each agent turn, useful for sequencing. [Events, history, interruptions]

Interruptions

audio_interrupted — emitted when the user interrupts the assistant mid-speech. The session also updates history at this point to reflect what was actually heard. [Events, history, interruptions]

Recommended minimal event set for your record

What you want to track	Event(s) to listen for
User messages (speech → text)	`history_added`, `history_updated`
Assistant messages	`history_added`, `history_updated`
Tool called (name, args, timing)	`tool_start`
Tool completed (return value)	`tool_end`
Turn boundaries	`agent_start`, `agent_end`
Interrupted audio	`audio_interrupted`

The knowledge sources don't document a single "replay log" API, but iterating the session's event stream and recording these events in order will give you a complete sequential record of the conversation and all tool activity.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 28, 2026, 1:28 AM

#

@unique dagger is there a tool to turn openai realtime agents events in the pythonsdk into dictionaries?

unique daggerBOT Apr 28, 2026, 1:28 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

The knowledge sources don't document a dedicated utility function specifically for converting Realtime session events into dictionaries.

What the sources do show is that the Realtime events (e.g., RealtimeAgentStartEvent, RealtimeAudio, RealtimeAudioEnd, etc.) are Python @dataclass objects [Realtime events], and the underlying model layer uses Pydantic models (e.g., model_dump_json(exclude_unset=True) is called when sending raw messages to the WebSocket). [OpenAIRealtimeWebSocketModel]

So your options with standard Python tooling are:

For the session-level events (dataclasses like RealtimeAgentStartEvent, RealtimeAudio, etc.), use Python's built-in dataclasses.asdict(event):
```
import dataclasses
event_dict = dataclasses.asdict(event)
```
For Pydantic model events at the model layer (e.g., RealtimeModelAudioEvent), use .model_dump():
```
event_dict = event.model_dump()
```

The knowledge sources don't mention a dedicated SDK helper beyond these standard approaches, so I can't confirm one exists in the SDK itself.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 28, 2026, 4:29 AM

#

@unique dagger when are RealtimeAgentStartEvent and RealtimeAgentEndEvent emitted

unique daggerBOT Apr 28, 2026, 4:29 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

Based on the knowledge sources:

RealtimeAgentStartEvent is emitted by RealtimeSession when it receives a model event with event.type == "turn_started" — i.e., when the model starts generating a response for a turn. [RealtimeSession]
RealtimeAgentEndEvent is emitted when the session receives a model event with event.type == "turn_ended" — i.e., when the model finishes generating for a turn. At this point the session also clears per-turn guardrail state. [RealtimeSession]

The underlying turn_started model event is itself triggered when the Realtime API sends a response.created server event, and turn_ended corresponds to response.done. [OpenAIRealtimeWebSocketModel]

Both events carry:

agent — the current RealtimeAgent at the time of the event
info — common event info (e.g., context) [Realtime events]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 28, 2026, 5:19 AM

#

@unique dagger when i run RealtimeSession.send_audio() i dont get any reply, how can i check what's happening?

unique daggerBOT Apr 28, 2026, 5:19 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

A few things to check, based on what the knowledge sources document:

1. Check for error events

Iterate the session's event stream and watch for error events:

async for event in session:
    if event.type == "error":
        print(f"Error: {event.error}")

[Quickstart]

2. Check your turn detection configuration

send_audio() does not automatically trigger a response in every configuration. The docs explicitly state:

Raw audio buffering does not automatically do the same in every configuration.

If server-side turn detection (server_vad or semantic_vad) is disabled, the model won't know when the user has finished speaking, so it won't generate a reply. [Inputs and outputs]

Fix option A: Enable turn detection in your runner config:

"audio": {
    "input": {
        "turn_detection": {
            "type": "semantic_vad",
            "interrupt_response": True,
        }
    }
}

Fix option B: If you're managing turns manually, you need to commit the audio buffer and trigger a response yourself:

# Commit the audio buffer (marks end of user turn)
await session.send_audio(audio_bytes, commit=True)

# Then manually trigger a response
from agents.realtime.model_inputs import RealtimeModelSendRawMessage

await session.model.send_event(
    RealtimeModelSendRawMessage(
        message={"type": "response.create"}
    )
)

[Inputs and outputs]

3. Listen for `raw_model_event` to see what the server is actually sending

#

You can listen for raw_model_event events to see the low-level server events coming back, which may reveal whether the server is receiving audio and processing it:

async for event in session:
    if event.type == "raw_model_event":
        print(event.data)

[Events, history, interruptions]

Most likely cause: turn detection is not configured, so the model never receives a signal to respond. Adding `turn_detection` to your config or manually committing + calling `response.create` should resolve it.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@oblique mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

oblique mango Apr 29, 2026, 3:51 AM

#

@unique dagger what events should i listen for from the realtime agents sdk during a voice conversation, to maintain a live chat transcript that updates live as the agent and user speaks

unique daggerBOT Apr 29, 2026, 3:51 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

For a live-updating transcript, you need to handle both the user's speech and the assistant's speech separately, as they come through different events.

User Speech (Input Audio Transcription)

At the model layer, listen for:

RealtimeModelInputAudioTranscriptionCompletedEvent — fires when the user's full turn transcript is ready. Contains item_id and transcript. [Model Events]

For incremental/live user transcription (streaming as the user speaks), the raw Realtime API emits conversation.item.input_audio_transcription.delta events with partial text, and conversation.item.input_audio_transcription.completed for the final transcript. [Handling transcriptions]

Note: for gpt-4o-transcribe and gpt-4o-mini-transcribe, the delta event contains incremental transcripts streamed in real time. For whisper-1, the delta contains the full transcript (same as completed). [Handling transcriptions]

Assistant Speech (Output Transcript)

At the model layer, listen for:

RealtimeModelTranscriptDeltaEvent — partial transcript update as the assistant speaks. Contains item_id, delta (the new text chunk), and response_id. This is what you use to update the transcript live as the assistant generates audio. [Model Events]

#

The RealtimeSession internally accumulates these deltas per item_id into self._item_transcripts[item_id] and emits history_updated each time. [RealtimeSession]

Higher-Level: History Events

If you prefer to work at the session level rather than the model event level, use:

history_added — a new item (user message, assistant message) was appended to the session history.
history_updated — an existing item was updated (e.g., transcript filled in, or assistant transcript delta accumulated).

Both expose RealtimeItem objects. [Events, history, interruptions]

Interruptions

audio_interrupted — the user interrupted the assistant. The session updates history to reflect what was actually heard, so your transcript should also truncate accordingly. [Events, history, interruptions]

Summary Table

What to show	Event to listen for	Level
User transcript (live, streaming)	`conversation.item.input_audio_transcription.delta` (raw)	Raw model event
User transcript (final)	`RealtimeModelInputAudioTranscriptionCompletedEvent`	Model layer
Assistant transcript (live, streaming)	`RealtimeModelTranscriptDeltaEvent`	Model layer
Any history change (simpler approach)	`history_added`, `history_updated`	Session layer
Interrupted assistant speech	`audio_interrupted`	Session layer

#

For the simplest implementation, listening to `history_added` and `history_updated` at the session level will cover both user and assistant turns, though the updates for the assistant will lag slightly behind the raw `transcript_delta` stream if you want truly live word-by-word updates.

#Contradiction in the OpenAI Realtime API Cookbook

If you’re building against the GA Realtime API (without the OpenAI-Beta: realtime=v1 header), you should follow the GA event names from the API reference, not the older ones shown in that cookbook.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

You can compare the returned list of items to your internal records to verify they match.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

The knowledge sources do not show any other Realtime-API-specific “fetch history” command beyond these patterns, so if you need more direct or lower-level history retrieval, it isn’t documented in the provided material.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

RealtimeSession

RealtimeRunner

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

If you specifically need “write into the stored RealtimeSession history but not as the most recent item,” the knowledge sources do not describe a supported mechanism for that.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

The sources do not show a way to “silently append to stored history” in the exact way you describe.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

1. Raw Realtime API (WebSocket)

2. Using RealtimeSession from openai-agents-python

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

This sends the equivalent of a session.update client event to the Realtime API, updating the prompt / information for that live RealtimeSession.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

So, whenever the model finishes generating for a turn (turn_ended), the session emits RealtimeAgentEndEvent for that agent.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

So: yes—clone the existing RealtimeAgent with new instructions, then call session.update_agent(new_agent) to update the instructions for that RealtimeSession.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

instructions

prompt

How they relate

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

1. Language the agent speaks (output voice language)

2. Language assumed for user audio transcription (input)

The language value should be a language code (e.g., "fr" for French, "es" for Spanish). The docs don't specify the exact format, but BCP-47 codes are standard for Whisper-based transcription.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Option 1: Increase silence_duration_ms (Server VAD)

Option 2: Switch to Semantic VAD

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

If the model is continuing before the tool finishes

Summary

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Conversation Content

Tool Use

Turn Boundaries

Interruptions

Recommended minimal event set for your record

The knowledge sources don't document a single "replay log" API, but iterating the session's event stream and recording these events in order will give you a complete sequential record of the conversation and all tool activity.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

The knowledge sources don't mention a dedicated SDK helper beyond these standard approaches, so I can't confirm one exists in the SDK itself.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

1. Check for error events

2. Check your turn detection configuration

3. Listen for raw_model_event to see what the server is actually sending

Most likely cause: turn detection is not configured, so the model never receives a signal to respond. Adding turn_detection to your config or manually committing + calling response.create should resolve it.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

User Speech (Input Audio Transcription)

Assistant Speech (Output Transcript)

Higher-Level: History Events

Interruptions

Summary Table

For the simplest implementation, listening to history_added and history_updated at the session level will cover both user and assistant turns, though the updates for the assistant will lag slightly behind the raw transcript_delta stream if you want truly live word-by-word updates.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

If you’re building against the GA Realtime API (without the `OpenAI-Beta: realtime=v1` header), you should follow the GA event names from the API reference, not the older ones shown in that cookbook.

2. Using `RealtimeSession` from `openai-agents-python`

This sends the equivalent of a `session.update` client event to the Realtime API, updating the prompt / information for that live `RealtimeSession`.

So, whenever the model finishes generating for a turn (`turn_ended`), the session emits `RealtimeAgentEndEvent` for that agent.

So: yes—clone the existing `RealtimeAgent` with new `instructions`, then call `session.update_agent(new_agent)` to update the instructions for that RealtimeSession.

`instructions`

`prompt`

The `language` value should be a language code (e.g., `"fr"` for French, `"es"` for Spanish). The docs don't specify the exact format, but BCP-47 codes are standard for Whisper-based transcription.

Option 1: Increase `silence_duration_ms` (Server VAD)

3. Listen for `raw_model_event` to see what the server is actually sending

Most likely cause: turn detection is not configured, so the model never receives a signal to respond. Adding `turn_detection` to your config or manually committing + calling `response.create` should resolve it.

For the simplest implementation, listening to `history_added` and `history_updated` at the session level will cover both user and assistant turns, though the updates for the assistant will lag slightly behind the raw `transcript_delta` stream if you want truly live word-by-word updates.