#Evals for OpenAI Assistant
1 messages · Page 1 of 1 (latest)
Hi @south bison. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
Yes, OpenAI supports evals for OpenAI assistants through the OpenAI Evals framework. This framework allows you to evaluate large language models (LLMs) or systems built on top of LLMs. Here’s how you can get started:
-
Setup: First, clone the OpenAI Evals repository and follow the setup instructions. You will need to set up your OpenAI API key as an environment variable.
git clone git@github.com:openai/evals.gitSet your API key:
export OPENAI_API_KEY="your-api-key" -
Running an Eval: You can run an eval using the
oaievalCLI. Install the necessary library and run the eval with a command like:pip install oaieval oaieval gpt-3.5-turbo spider-sqlThis command requires a model name and an eval set name. The valid eval names are specified in YAML files under
evals/registry/evals. -
Building an Eval: To create a custom eval, you need a dataset in JSONL format and an eval class defined in a YAML file. You can use existing templates or create your own.
-
Types of Evals: There are different types of evals, such as writing logic for answer checking or using model grading, where the model evaluates its own output.
-
Custom Evals: While OpenAI is not currently accepting evals with custom code, you can still submit model-graded evals with custom YAML files.