#Test

1 messages · Page 1 of 1 (latest)

crimson knoll
#

Im using OpenAI agent sdk and when using multiple tool like 6 web search es, the combined token output is highest than my main agent’s max input token, how to avoir that

shell mesaBOT
#

Hi @crimson knoll. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

You’re running into the model’s context window limit: the total tokens from your agent’s prompt + tools’ outputs + the model’s reply cannot exceed that model’s max context. [Conversation state; Token limits]

From the provided docs, there isn’t an agent‑SDK–specific “auto cap” for tool outputs, so you need to control total tokens yourself. Here are options that are documented:

  1. Use a smaller effective context window (manual truncation)
    Before calling the model, trim older or less important messages so that the sum of input tokens (conversation + tool outputs) stays below the model’s context limit. This is the same principle as truncation in Realtime, where you drop older messages to stay under a token window. [Realtime truncation; Conversation state]

  2. Limit tool usage / web calls
    The web tool can be limited in number of calls (e.g., via max_tool_calls in general tool usage guidance), which directly caps how much tool output can be added to context. This is explicitly recommended to control cost and behavior for tools like web search. [Web tool cost]

#
  1. Constrain tool outputs themselves

    • Ask tools (or the model when calling tools) to return summaries instead of full pages, or to cap length (e.g., “return at most N tokens / short bullet summary”).
    • This reduces how many tokens from each of the 6 web searches are injected back into the model’s context, helping you stay under the main agent’s input limit. This is consistent with the general guidance to shorten or summarize inputs when you hit token limits. [Token limits]
  2. Set a lower max output for the model
    Use the model’s max_tokens / max_completion_tokens / max_output_tokens setting (depending on which API path you’re using) to keep output tokens small, leaving more room for input tokens within the same context window. [Length control; ModelSettings.max_tokens]

  3. Monitor usage per run and adjust
    The Agents SDK exposes per‑run token usage (input_tokens, output_tokens, total_tokens) so you can detect when you’re close to the limit and then:

    • stop issuing more web searches, or
    • aggressively summarize / drop older context. [Agents usage]
#

If you share which exact model you’re using (e.g., gpt-4.1-mini, gpt-4o-2024-08-06), you can then look up its context window and design a concrete budget (e.g., “max X tokens for tools, Y for history, Z for output”) based on the documented limits. [Fine-tuning token limits; Conversation state]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: