Hi all,
I'm currently building a chatbot using gpt-4.1-nano, and I'm trying to leverage prompt caching since a large portion of the prompt context remains identical across multiple user queries.
To do this, I’ve added the prompt_cache_key parameter to my requests and also console-logged the constructed prompt to verify that the majority of the context is unchanged. The total prompt length is over 10,000 tokens — well above the 1,024-token threshold for caching to be available, as noted in the documentation: https://platform.openai.com/docs/guides/prompt-caching
Despite this, I'm only ever seeing "cached_tokens": 0 in all API responses.
Is there something I'm missing about how prompt caching works or is triggered? Any guidance or clarification would be greatly appreciated!