#Token bias for assistant runs

1 messages · Page 1 of 1 (latest)

knotty zodiac
#

I don't see logit_bas as a paramater in the API reference for assistants or assistant runs. Is there any way to discourage the assistant from outputting particular tokens? My assistant is sending too long and overly-enthusiastic messages so I would like to reduce the likelihood it uses ! and \n, and maybe some other tokens. I’m using the API with Python.

worn violetBOT
#

Hi @knotty zodiac. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

The OpenAI API does provide a way to modify the likelihood of specific tokens appearing in a model-generated output using the logit_bias parameter. However, this parameter is not directly mentioned in the context of the Assistants API in the provided knowledge sources.

The logit_bias parameter accepts a JSON object that maps tokens to an associated bias value from -100 (which will in most cases block that token from being generated) to 100 (exclusive selection of the token which makes it more likely to be generated). Moderate values like -1 and 1 will change the probability of a token being selected to a lesser degree.

To use this parameter, you'll need to convert your text to token IDs using a tokenizer tool. For example, if you want to reduce the likelihood of the ! and \n tokens, you would first convert these to their corresponding token IDs, and then set a negative bias value for these IDs in the logit_bias parameter.

However, please note that the logit_bias parameter might not be available or function in the same way for the Assistants API. You should refer to the official API documentation (https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias) or contact OpenAI support for more specific guidance.

If you're finding that the assistant is generating overly long responses, you might want to consider setting a limit on the number of tokens used in a run by using the max_prompt_tokens and max_completion_tokens parameters when creating a run. This can help control the length of the generated responses.

Here's an example of how you might set these parameters in Python:

#
run = openai.Assistant.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
    ],
  max_prompt_tokens=500,
  max_completion_tokens=1000
)

In this example, the first completion will truncate the thread to 500 tokens and cap the output at 1000 tokens. If only 200 prompt tokens and 300 completion tokens are used in the first completion, the second completion will have available limits of 300 prompt tokens and 700 completion tokens.

Again, please refer to the official API documentation (https://platform.openai.com/docs/assistants/deep-dive/creating-assistants) or contact OpenAI support for more specific guidance.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: