#Evals for OpenAI Assistant

1 messages · Page 1 of 1 (latest)

south bison
#

Does openai support evals for Openai assistants? if so, how to do it?

charred iceBOT
#

Hi @south bison. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Yes, OpenAI supports evals for OpenAI assistants through the OpenAI Evals framework. This framework allows you to evaluate large language models (LLMs) or systems built on top of LLMs. Here’s how you can get started:

  1. Setup: First, clone the OpenAI Evals repository and follow the setup instructions. You will need to set up your OpenAI API key as an environment variable.

    git clone git@github.com:openai/evals.git
    

    Set your API key:

    export OPENAI_API_KEY="your-api-key"
    

    Source

  2. Running an Eval: You can run an eval using the oaieval CLI. Install the necessary library and run the eval with a command like:

    pip install oaieval
    oaieval gpt-3.5-turbo spider-sql
    

    This command requires a model name and an eval set name. The valid eval names are specified in YAML files under evals/registry/evals.

    Source

  3. Building an Eval: To create a custom eval, you need a dataset in JSONL format and an eval class defined in a YAML file. You can use existing templates or create your own.

    Source

  4. Types of Evals: There are different types of evals, such as writing logic for answer checking or using model grading, where the model evaluates its own output.

    Source

  5. Custom Evals: While OpenAI is not currently accepting evals with custom code, you can still submit model-graded evals with custom YAML files.

#

Source

For more detailed instructions, you can refer to the OpenAI Evals documentation. If you have specific questions or need further assistance, feel free to ask!

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: