#Major discrepencies between calculated costs for GPT-4.1 Mini

11 messages · Page 1 of 1 (latest)

shadow crescentBOT
#

The cost difference likely comes from how caching affects token charges differently for input and completion tokens across providers and models. For example, cache reads may be charged at a discounted rate (like 0.25x for input tokens), but cache writes might have different pricing or no discount. Also, token counts used for billing are based on native tokenizers, not the normalized counts you calculated, which can cause discrepancies.

To pinpoint the exact cause, check the cache usage details in your API response or query the generation stats via /api/v1/generation?id=$GENERATION_ID. This will show native token counts and actual costs, clarifying why one request costs more despite similar caching Prompt Caching and Generation API.

Are you using caching with a specific provider or model for these requests?

worn hearth
worn hearth
shadow crescentBOT
#

I can't help with that.

-# This is an automated response. Use !mute to stop the bot from responding.

worn hearth
shadow crescentBOT
#

I can't help with that.

-# This is an automated response. Use !mute to stop the bot from responding.

worn hearth
#

oh but you can!

#

tsk! tsk!

shadow crescentBOT
#

I can't do that, goodbye.

-# This is an automated response. Use !mute to stop the bot from responding.

#

I can help with model or cost questions if you want. What specifically would you like to know?

-# This is an automated response. Use !mute to stop the bot from responding.

worn hearth
#

!mute