As you can see, the same message shows different amount of tokens generated, so how does it work then? This always used to happen but it wasn't a big deal for free models. Now I try to occasionally use paid models (since most free models either suck or have been removed), so this ends up charging me more than I actually use. I even set a limit (350 tokens) in OpenRouter (A preset), even in chub.ai as well, but it always ignored them. Can anyone please help me understand this?
#Something about tokens I don't understand.
1 messages · Page 1 of 1 (latest)
I'm an AI assistant - my answers may be incorrect and I cannot see screenshots. Please verify information with the OpenRouter docs.
Token counts can vary for the same message because each model may use a different tokenizer, and even small changes (formatting, system prompts, context) can change how messages are split into tokens. OpenRouter calculates usage and billing using the model’s native tokenizer, so exact counts may differ between models or platforms.
Key points:
- "Token counts are calculated using the model’s native tokenizer" and can vary by model or even context Usage Accounting.
- If you want to track tokens precisely, enable usage accounting with
"usage": { "include": true }in your API request to see the exact token counts and costs returned in the response. - Setting a
max_tokensor a preset in OpenRouter should limit the number of generated tokens, but some frontends (like chub.ai) may override or ignore these if not configured properly. - For models with reasoning tokens, note: "reasoning tokens are counted as output tokens for billing purposes. Using reasoning tokens will increase your token usage" Reasoning Tokens.
If you believe you are being charged for more tokens than actually used, please:
- Check the "usage" object in your API responses for detailed breakdowns.
- Make sure your frontend (like chub.ai) is not changing the token limits or sending extra context.
- If the issue persists, contact [email protected].
For more details on token accounting and debugging, see:
- Usage Accounting
- [Reasoning ...
...Tokens](https://openrouter.ai/docs/use-cases/reasoning-tokens)
-# This is an automated response powered by Answer HQ. Use !mute to stop the bot from responding.
If you're talking about output:
Grok 4 fast might have reasoning enabled, so its thinking before answering so writing a few more tokens you cant see
If you have reasoning off, different models have different "tokenizers" which cause discrepenacies between counts
If you're talking about input:
Conversations get long and the whole conversation gets fed back in, not just 1 message, which makes it inflate quickly the longer the conversation is
Hi. How do I check to see if reasoning is enabled or not? And I'm not sure I understand the latter part of your text.
you can click on the arrow next to the activity log and it will show how many output tokens were "reasoning" ones
and as for the input part, i meant like the longer the conversation the more tokens are used as input
Aye.. Thanks, I checked and didn't realize it took separate tokens for that. But is there a way to disable reasoning?
As for the input, I change to a free model, so how does it affect it?
im not sure how to disable reasoning in chub
as for the input it should be the same as the free model, im just saying the longer the conversation the more it will cost
I didn't mean chub, I meant on Openrouter. But thanks for at least clearing up my confusion.
in the or chatroom you should be able to do it by clicking on the three dots next to the model at the top and then turning off reasoning