#Assistant API: Assistant doesn't always output full JSON data as requested

1 messages · Page 1 of 1 (latest)

narrow swift
#

Hi Guys, I have a question about prompting the AI (specifically through the assistants API, but any answer should be applicable). I am working on a project where I pass in some json data into the assistant and tell it to populate some values from the json data file (without touching the other values) and then outputting the entire json file with the modifications. My current setup for this is with 2 different assistants, one assistant gets the json data uploaded through file search and gives the output, and the other assistant is in json mode and I pipe the output of the first assistant into the second assistant to ensure that the json data is in the correct format because without that the first assistant likes to get creative and output malformed json data 😆

Anyway, my actual problem with this setup is that when dealing with very large json data being passed in through file upload, the AI like to delete a bunch of data and only do it's prompted work on about half the data and completely omits the rest from the output. Is there any way I can get the assistant to comply and populate the required data for the entire json file rather than on some json entries? I was thinking maybe this is something to do with token limit, but I really have no ideas right now. One solution I was thinking could work is if I chunk the json data and pass it slowly to the assistant but I would like to avoid this route as it can cause problems later down the line.

TLDR; I want the AI to populate large json data files without omitting any data or skip any parts.

zenith grove
#

Are you using the response_format parameter to ensure you get a complete JSON response? https://platform.openai.com/docs/api-reference/assistants/createAssistant#assistants-createassistant-response_format
I would suggest passing a JSON schema to tell the assistant to adhere to it so it doesn’t omit data.
If your schema changes per message, then use the json_object value for the response_format when creating the assistant and pass your JSON schema in each message as part of the prompt . If it’s always the same schema use the json_schema for the response_format value and pass your JSON schema when you create your assistant .
In a JSON schema you can specify what fields are required to further ensure it doesn’t delete data.
Try lowering your temperature to 0 to also cut down on creativeness if your are manipulating JSON data.

narrow swift
# zenith grove Are you using the response_format parameter to ensure you get a complete JSON re...

The JSON data isn't my problem really, as I mentioned I solved that by crafting the response_format on the second assistant so it always outputs corret JSON data. My problem is that it often fails to modify all parts of the inputted json data when the inputted json data is very large. I prompt it to populate some keys in json objects but it usually only does that for the first few objects and outputs that.

#

My best assumption is that I'm hitting some sort of token limit so it stops. I don't know how to get around this.

zenith grove
#

So the problem is on the first assistant or the second one? Which one fails to follow your instructions?

#

I’m also unclear on why you need two assistants.

#

As I understand it the 2nd assistant is only used to ensure you have valid JSON output, but does the data population happen in the first or 2nd assistant?

lethal panther
#

@narrow swift, did you find a solution for this?

narrow swift
#

You would just have to chunk the input data by yourself and split it between different assistant runs. That's what I did.

lethal panther