hey, ive been trying to calculate the total tokens used by prompts while using the OpenAI api and it seems like i can never get it to be accurate. i think when i send the messages there is some extra metadata that gets added to the prompt and i cant count those tokens with tiktoken, if there is any way for me to know what is added that would be neat as that would mean i could finally know the token usage accurately while using the streaming api
i forgot to mention that i can get the tokens used in the response and its 1 on 1 with the non streaming results
#Issues getting the token usage with streaming chats
1 messages · Page 1 of 1 (latest)
as context, here is the results
here is the code:
use std::io::{stdout, Write};
use openai_macros::{ai_agent, message};
use openai_utils::{api_key, calculate_tokens, FunctionCall, Message};
use tiktoken_rs::{ChatCompletionRequestMessage, get_chat_completion_max_tokens};
macro_rules! print_chat {
($l:ident) => {
while let Some(content) = $l.receive_content(0).await? {
print!("{content}");
stdout().flush()?;
}
};
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
dotenv::dotenv().unwrap();
api_key(std::env::var("OPENAI_API_KEY").unwrap());
let agent = ai_agent! {
model: "gpt-3.5-turbo",
temperature: 0.0,
system_message: "give the same sentence back in 5 other languages",
messages: message!(user, content: "hello my name is robertas")
};
let mut receiver = agent.create_stream().unwrap();
print_chat!(receiver);
println!();
println!();
println!("stream: {:#?}", receiver.construct_chat().await?.usage);
let res = agent.create().await?.usage;
println!("normal: {:#?}", res);
Ok(())
}
if you dont understand the language but know how to help i can explain what this does
each message has tags and role that is also counted
i actually looked over it and i couldnt get how it summed up to 27, maybe you could explain how it would be represented?
from what i see the closest way it could be represented is without including the {} and "" in the resposne and jsut using plaintext?
println!("{}", calculate_tokens("<|im_start|>"));
println!("{}", calculate_tokens("<|im_end|>"));
println!("{}", calculate_tokens("assistant"));
println!("{}", calculate_tokens("user"));
println!("{}", calculate_tokens("system"));
println!("{}", calculate_tokens("function"));
i dont have any code near me at the moment but this will probably help https://github.com/Cainier/gpt-tokens/blob/main/index.js#L166
wait so if i see this right then the whole message is turned into json and then encoded?
each message takes a fixed number of tokens based on chatml version
that of course doesnt match up with the chatml
openai has stealth changed it few times but its easy to match up when you remember to check it vs usage page when you begin to use a new model
ah so i guess the easiest way is to just take the non streaming response prompt usage - the streaming one for each message type?
and the tokens for the user i would guess
oh and the function call... maybe imma finish this tomorrow because i will have to redo the whole thing lol
i would be happy to be able to reconstruct it into chatml so its easier to change later on but whatever.
just count all text tokens normally and then multiply message count with the proper chatml usage count
ok, i will make sure to check the user string token usage as well later
i guess this is going to be an another friday afternoon finding all this out and implementing it
well i think the biggest thing i could hope for is for openai to return a usage object as the last response within a stream
but thats gonna happen after i probably get this 4 hour job done (in the worst case)
streaming has been available for ages but they never added the usage to responses even tho thousands of users have asked for it 🤷♂️
yeah, well i might later make this into an api for others to use, but i bet there is one already which i just dont know about
btw thanks for the help!
np
ok well thanks for the help a ton, i got it to match up perfectly now!