#File classifier - poor results with the API compared to ChatGPT

1 messages · Page 1 of 1 (latest)

topaz jacinth
#

Hello,
I am tried to build a file classification feature. The goal is to get a file as input (image or pdf) and a type of document which is expected. The output sould contain a score how well the input file fits the expected type. For example, if you provide invoice as expected input type, but you upload an image of a cat, you will get a low score.

I implemented this in ChatGPT by creating a simple GPT. This works fine and it takes abount 3 sec to classify a pdf document (longer for images).

BUT:
I tried to achieve the same using the assistants api (code is below) and i am facing multiple issues here:

At first, the response takes very long. To analyse a pdf document, remember abount 3s in ChatGPT). Via the assistants API, the best result i had was abount 10s. Most of the time, it takes abount 17s (with peaks up to over 60s). This is not usable for production.

My second issue is, that the model does not follow the instructions like it does in ChatGPT. In my GPT, i NEVER got output outside the expected JSON. Via the API, I am often getting extra text before and after the JSON.

And last but not least: I didn't understand how i can let the model analyse images using the API like it is possible by posting the image inside the ChatGPT chat. I only found a section abount this in the chat completion api section of the docs. But i cannot provide pdf documents to the completion api as far as i could figure out? How can i provide both types of files?

For both, my GPT and the API, i use the exact same promt und the gpt4-0 model.

Does anyone know how to fix the above issues?

Thanks in advance

Here is my code i am using to call the assistants api:

#
def classify_file(file_obj, expected_class):
client = OpenAI(
        api_key=OPENAI_API_KEY,
    )

    # uplaod file
    file = client.files.create(
        file=file_obj,
        purpose='assistants'
    )

    assistant = client.beta.assistants.create(
        name="File Classifier",
        instructions="""
            You are a file classifier. You are given a file (pdf or image) and a classification. Your task is to check if the file matches the expected classification. 
            Please output the result as a JSON object with the following keys (do not output something else):
            
            matchScore: a value between 0 and 100 that indicates how well the file matches the expected classification. 0 means no match, 100 means perfect match.
            validFrom: the creation date or valid from date of the file if available. Format: YYYY-MM-DD, null if not available.
            validTo: the expiration date or valid to date of the file if available. Format: YYYY-MM-DD, null if not available.
            classification: the classification of the file. Also if the file does not match the expected classification, you should provide the correct classification here.
            
            Its really important that you only output the JSON object with the keys mentioned above.
        """,
        tools=[
            {"type": "code_interpreter"}
        ],
        model="gpt-4o",

    )

    thread = client.beta.threads.create(
        messages=[{
            'role': 'user',
            'content': f'classification: {expected_class}'
        }],
        tool_resources={
            "code_interpreter": {
                "file_ids": [file.id]
            }
        },
    )



#
    run = client.beta.threads.runs.create(thread_id=thread.id, assistant_id=assistant.id)
    status = run.status
    timeout = 60 # seconds
    started = time()
    while time() - started < timeout:
        run = client.beta.threads.runs.retrieve(run.id, thread_id=thread.id)
        status = run.status
        if status == 'completed':
            break
        sleep(1)

    # cleanup
    client.files.delete(file.id)
    client.beta.assistants.delete(assistant.id)

    if status != 'completed':
        print('Timeout, status is', status)
        return None

    messages = client.beta.threads.messages.list(thread_id=thread.id)
    response = messages.data[0].content[0].text.value
    response = response.replace('```json\n', '').replace('```', '')
    response = '{' + response.split('{')[-1]
    response = response.split('}')[0] + '}'
    try:
        return json.loads(response)
    except:
        return None
slender nacelle
#

have you tried an assistant and playgrounds uploading the file as a vector?