#Your file does not appear to be in valid JSON format.

17 messages · Page 1 of 1 (latest)

edgy oracle
#

Hey there! I'm trying to fine-tune a GPT-3 model using a small dataset with around 100 rows in the format:

{
"question": "answer",
"question": "answer",
"question": "answer"
}

But when I try to run it through fine_tunes.prepare_data, I get this error:

ERROR in read_any_format validator: Your file `data.json` does not appear to be in valid JSON format. Please ensure your file is formatted as a valid JSON file.

I've got the latest OpenAI pip module and generated the file using json.dump(ensure_ascii=False), and I've even run it through a few online JSON validators. The file includes special unicode characters like emojis and åäö. Any idea what's going on?

quaint trellis
#

Looking at the Fine-tuning docs I think you need to format your data in JSONL. So in your case it would look like this:

{ "prompt": "question1", "completion": "answer1" }
{ "prompt": "question2", "completion": "answer2" } 
#

Ah, I might be wrong: fine_tunes.prepare_data should indeed be able to handle your format as well and convert it to JSONL. 🤔

quaint trellis
#

I checked docs again and think, that your JSON still is problematic. Docs say:

This tool accepts different formats, with the only requirement that they contain a prompt and a completion column/key.

So I think you would rather have to have a JSON like this:

{
    "data": [
        {
            "prompt": "question1",
            "completion": "answer1"
        },
        {
            "prompt": "question2",
            "completion": "answer2"
        }
    ]
}
edgy oracle
#

It loads all my pairs but says they're incorrect.

#

The pairs look like this:

        {
            "prompt": "nice",
            "completion": " nice\n"
        },
quaint trellis
#

Maybe not related, but the JSONL samples don't have commas after }. Can you try removing it?

#

It might also be necessary to escape the \. So it would become \\. Not sure about that though.

edgy oracle
#

I decided to make a quick script to convert it into JSONL. Worked with that.

sterile lark
#

Yeah, I've been manipulating the JSON to be a JSONL in structure but JSON in extension. The API seems to complain if a JSON isn't formatted a JSONL

granite cloud
#

JSONL has one object per line with no trailing comma.

#

{ "prompt": "why?","completion": "why not?"}

edgy oracle
#

I'm aware. But this thread was about fine_tunes.prepare_data which takes .csv, .json ect. And converts it into JSONL