#Why do `system` messages behave differently across different providers?

11 messages · Page 1 of 1 (latest)

loud escarp
#

I was inserting system messages throughout a conversation, but I noticed that sometimes the prompt caching breaks, or that the system messages end up at the beginning of the conversation even though I had placed them where I wanted.

Does anyone know why this is happening? I'd like to know if this is standard behavior for a specific type of chat template or something similar that certain providers have implemented.

vague belfryBOT
#

I'm an AI assistant - my answers may be incorrect and I cannot see screenshots. Please verify information with the OpenRouter docs.

What you're seeing is due to how different providers and models implement system message handling and prompt caching—especially with Gemini and Anthropic models via OpenRouter. Here are core reasons:

  1. System Message Caching & Placement:

    • For Gemini models, there is only a single systemInstruction field. When you use cache_control inside the first system or developer message, "it cannot preserve an uncached dynamic tail inside that same message. If you need part of your prompt to stay dynamic, move that dynamic content into a later user message instead of appending it after a cached block in the first system message." This means all system instructions are normalized and often moved to the conversation start, regardless of where you put them in the array. This is standard Gemini behavior via OpenRouter.
    • Anthropic models support both automatic and explicit cache breakpoints. If you add cache_control at the top-level or as explicit per-block control, you get more predictable placement, but still, "the system automatically applies the cache breakpoint to the last cacheable block and advances it forward as conversations grow." System messages may be reordered or merged for caching efficiency, especially with large or repeated content blocks Prompt Caching Guide.
  2. Prompt Caching Breaks with Dynamic Content:

    • If you append new, dynamic content after a cached system block, the cache can't be reused and may "break" or reset. The docs state: "If you need part of your prompt to stay dynamic, place it in a l...
#

...ater user message rather than as uncached trailing content in the first system message" Prompt Caching Guide.

  1. Provider Differences:
    • "Automatic caching (top-level cache_control) is only supported when requests are routed to the Anthropic provider directly. Amazon Bedrock and Google Vertex AI currently do not support top-level cache_control — when it is present, OpenRouter will only route to the Anthropic provider and exclude Bedrock and Vertex endpoints. Explicit per-block cache_control breakpoints work across all Anthropic-compatible providers including Bedrock and Vertex" Prompt Caching Guide.

Summary:

  • Gemini system messages are normalized and always placed at the start. Dynamic parts must be in user messages.
  • Caching may reset if you append dynamic content after a cached block.
  • Anthropic allows more flexibility, but message placement may still be adjusted for caching logic.

For exact behaviors, see the detailed examples and explanations in the Prompt Caching Guide and LLMs Full Reference.

-# This is an automated response powered by Answer HQ. Use !mute to stop the bot from responding, or !close when your question is resolved.

placid mica
#

System messages always get yeeted to the top

loud escarp
placid mica
loud escarp
# placid mica What do you mean?

A message that is added to the context window and serves as a reminder so that the model doesn't get confused during long sessions

I think Anthropic used to do this or currently does this with Claude

placid mica
#

Ah

#

Hmm

#

You can say

#

"remember not to do X and Y, as stated in the system prompt"