Hello,
I am tried to build a file classification feature. The goal is to get a file as input (image or pdf) and a type of document which is expected. The output sould contain a score how well the input file fits the expected type. For example, if you provide invoice as expected input type, but you upload an image of a cat, you will get a low score.
I implemented this in ChatGPT by creating a simple GPT. This works fine and it takes abount 3 sec to classify a pdf document (longer for images).
BUT:
I tried to achieve the same using the assistants api (code is below) and i am facing multiple issues here:
At first, the response takes very long. To analyse a pdf document, remember abount 3s in ChatGPT). Via the assistants API, the best result i had was abount 10s. Most of the time, it takes abount 17s (with peaks up to over 60s). This is not usable for production.
My second issue is, that the model does not follow the instructions like it does in ChatGPT. In my GPT, i NEVER got output outside the expected JSON. Via the API, I am often getting extra text before and after the JSON.
And last but not least: I didn't understand how i can let the model analyse images using the API like it is possible by posting the image inside the ChatGPT chat. I only found a section abount this in the chat completion api section of the docs. But i cannot provide pdf documents to the completion api as far as i could figure out? How can i provide both types of files?
For both, my GPT and the API, i use the exact same promt und the gpt4-0 model.
Does anyone know how to fix the above issues?
Thanks in advance
Here is my code i am using to call the assistants api: