To get a number as a response to the prompts, you can specify your instruction in a way that guides the model to generate a numerical output. This can be done either by providing the instruction with each API call or by using an assistant, depending on your specific use case.
For example, if you're using the API directly, your instruction might look something like this: "Given the following data, calculate the average..." or "Predict the next number in this sequence...". The model will then generate a response based on your instruction.
As for token usage, both the input and output tokens count towards the total tokens used in an API call. This includes the tokens in your instruction. For instance, if your API call used 10 tokens in the message input and you received 20 tokens in the message output, you would be billed for 30 tokens[^1^][^2^].
The token usage also affects how long your API call takes, as writing more tokens takes more time[^1^][^2^]. Moreover, the total tokens must be below the model’s maximum limit (4097 tokens for gpt-3.5-turbo)[^1^][^2^].
You can check the token usage by looking at the usage field in the API response (e.g., response**'usage']['total_tokens'])[^1^][^2^]. To see how many tokens are in a text string without making an API call, you can use OpenAI’s [tiktoken** (https://github.com/openai/tiktoken) Python library[^1^][^2^].
Remember, managing costs is an important aspect of using the API. You can reduce costs by using shorter prompts, fine-tuning models, or caching common user queries[^3^][^4^].