#Assistant API pricing

1 messages · Page 1 of 1 (latest)

quiet dove
#

Hi! I’ve been experimenting with the new assistant api. It works really well well so far. The docs say pricing is based on the token usage of the underlying model. However, none of the responses seem to have the usage info on them like the normal ChatCompletions do.

Anyone know where to find this info, or some approximation of what is actually submitted behind the scenes?

open crow
#

@quiet dove for now until it's clarified, I'd assume that the pricing for tokens used is the same as gpt-4-turbo, behind the scenes it likely uses some sort of embedding-based retrieval so there should not be any LLM action behind the scenes, so no hidden tokens except for the tokens of the retrieved chunks that it's using to form the answer, which I agree is ambigious

vague pier
#

Yesterday it seemed that it was filling the context with retrieval every time, resulting in millions of context tokens used for few calls. Would like more clarification on this