A possible answer to your question:
Summary of text below (maybe also see the next message concerning costs): Given the quote below, It seems the entire conversation counts in the token usage
I think the documentation for the APIs on the openAI website answers the question. Specifically, the URL that ends with platform.openai.com/docs/guides/chat/introduction which is under:
Documentation>Guides>Chat completion>Introduction>Manage tokens (you can also just visit the documentation page and use the keyboard shortcut CTRL+F to search for a part of the quote below).
I highlighted particularly relevant parts.
...if your API call used 10 tokens in the message input and you received 20 tokens in the message output, you would be billed for 30 tokens.
To see how many tokens are used by an API call, check the usage field in the API response (e.g., response[‘usage’][‘total_tokens’]).
To see how many tokens are in a text string without making an API call, use OpenAI’s tiktoken Python library. Example code can be found in the OpenAI Cookbook’s guide on how to count tokens with tiktoken.
Each message passed to the API consumes the number of tokens in the content, role, and other fields, plus a few extra for behind-the-scenes formatting. This may change slightly in the future.
If a conversation has too many tokens to fit within a model’s maximum limit (e.g., more than 4096 tokens for gpt-3.5-turbo), you will have to truncate, omit, or otherwise shrink your text until it fits. Beware that if a message is removed from the messages input, the model will lose all knowledge of it.
Note too that very long conversations are more likely to receive incomplete replies. For example,** a gpt-3.5-turbo conversation that is 4090 tokens long will have its reply cut off after just 6 tokens.**