#Something about tokens I don't understand.

1 messages · Page 1 of 1 (latest)

gleaming sparrow
#

As you can see, the same message shows different amount of tokens generated, so how does it work then? This always used to happen but it wasn't a big deal for free models. Now I try to occasionally use paid models (since most free models either suck or have been removed), so this ends up charging me more than I actually use. I even set a limit (350 tokens) in OpenRouter (A preset), even in chub.ai as well, but it always ignored them. Can anyone please help me understand this?

hidden lightBOT
#

I'm an AI assistant - my answers may be incorrect and I cannot see screenshots. Please verify information with the OpenRouter docs.

Token counts can vary for the same message because each model may use a different tokenizer, and even small changes (formatting, system prompts, context) can change how messages are split into tokens. OpenRouter calculates usage and billing using the model’s native tokenizer, so exact counts may differ between models or platforms.

Key points:

  • "Token counts are calculated using the model’s native tokenizer" and can vary by model or even context Usage Accounting.
  • If you want to track tokens precisely, enable usage accounting with "usage": { "include": true } in your API request to see the exact token counts and costs returned in the response.
  • Setting a max_tokens or a preset in OpenRouter should limit the number of generated tokens, but some frontends (like chub.ai) may override or ignore these if not configured properly.
  • For models with reasoning tokens, note: "reasoning tokens are counted as output tokens for billing purposes. Using reasoning tokens will increase your token usage" Reasoning Tokens.

If you believe you are being charged for more tokens than actually used, please:

  1. Check the "usage" object in your API responses for detailed breakdowns.
  2. Make sure your frontend (like chub.ai) is not changing the token limits or sending extra context.
  3. If the issue persists, contact [email protected].

For more details on token accounting and debugging, see:

The author you are looking for could not be found.

rich snow
# gleaming sparrow As you can see, the same message shows different amount of tokens generated, so ...

If you're talking about output:
Grok 4 fast might have reasoning enabled, so its thinking before answering so writing a few more tokens you cant see
If you have reasoning off, different models have different "tokenizers" which cause discrepenacies between counts

If you're talking about input:
Conversations get long and the whole conversation gets fed back in, not just 1 message, which makes it inflate quickly the longer the conversation is

gleaming sparrow
rich snow
#

and as for the input part, i meant like the longer the conversation the more tokens are used as input

gleaming sparrow
rich snow
gleaming sparrow
rich snow