The cost difference likely comes from how caching affects token charges differently for input and completion tokens across providers and models. For example, cache reads may be charged at a discounted rate (like 0.25x for input tokens), but cache writes might have different pricing or no discount. Also, token counts used for billing are based on native tokenizers, not the normalized counts you calculated, which can cause discrepancies.
To pinpoint the exact cause, check the cache usage details in your API response or query the generation stats via /api/v1/generation?id=$GENERATION_ID. This will show native token counts and actual costs, clarifying why one request costs more despite similar caching Prompt Caching and Generation API.
Are you using caching with a specific provider or model for these requests?