#Step-by-step guide to use openai evals

1 messages · Page 1 of 1 (latest)

severe shoal
#

Can i have a Step-by-step guide to use openai evals?

hidden zephyrBOT
#

Hi @severe shoal. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Sure, here's a step-by-step guide to use OpenAI evals based on the information provided:

  1. Building your dataset: The first step in the process is to build your dataset. This dataset will be used in the evaluation process.

  2. Registering a new eval with your dataset: Once you have your dataset ready, you need to register a new eval with your dataset.

  3. Running your eval: After registering your eval, you can run it using the following command:

oaieval gpt-3.5-turbo <eval_name>

Replace <eval_name> with the name of your eval.

  1. Creating a new YAML: If you can't use an existing model-graded eval, create a new YAML or create a new entry to an existing YAML in evals/registry/modelgraded to specify the parameters of your eval.

  2. Adding a meta-eval for the model-graded eval: It is recommended to add a meta-eval for the model-graded eval. Each model-graded eval comes with a few knobs to tune, mainly prompt but also eval_type. In order to make sure the eval is of high quality, we recommend each model-graded eval contribution come with "choice labels", which are basically human-provided labels for which evaluation choice the model should have made.

Please note that OpenAI is currently not accepting Evals with custom code. You can still submit modelgraded evals with custom modelgraded YAML files.

For more detailed information, you can refer to the OpenAI evals documentation (https://github.com/openai/evals/blob/main/docs/build-eval.md).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@severe shoal kapa.ai is still learning and improving, please let us know how it did by reacting below

severe shoal
#

@hidden zephyr Can i have a simple python source code example boilerplate for this ?

hidden zephyrBOT
#

Let me check the knowledge base for you. One moment! :books:

#

Sure, here's a simple Python source code example for creating a custom eval based on the information provided:

import random
import textwrap

import evals
import evals.metrics

class Arithmetic(evals.Eval):
    def __init__(self, train_jsonl, test_jsonl, train_samples_per_prompt=2, **kwargs):
        super().__init__(**kwargs)
        self.train_jsonl = train_jsonl
        self.test_jsonl = test_jsonl
        self.train_samples_per_prompt = train_samples_per_prompt

    def run(self, recorder):
        """
        Called by the `oaieval` CLI to run the eval. The `eval_all_samples` method calls `eval_sample`.
        """
        self.train_samples = evals.get_jsonl(self.train_jsonl)
        test_samples = evals.get_jsonl(self.test_jsonl)
        self.eval_all_samples(recorder, test_samples)

This is a basic example of a custom eval class. The __init__ method initializes the class with references to the train and test sets, and the run method is used to run the eval. The eval_all_samples method, which is not shown here, would be used to evaluate each sample in the test set.

Please note that OpenAI is currently not accepting Evals with custom code. You can still submit modelgraded evals with custom modelgraded YAML files.

For more detailed information, you can refer to the OpenAI evals documentation (https://github.com/openai/evals/blob/main/docs/custom-eval.md).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: