How can I make ChatGPT 4 perform a Vision task before a code-interpreter task in the same prompt? | OpenAI | Page 1

thorny mountain Jan 29, 2024, 5:25 AM

#

An issue I've run into recently is trying to make ChatGPT 4 analyze an image using vision before using code interpreter to do something based on that vision analysis. Right now if I include instructions like "first look at the image using vision, then use code interpreter to generate me a CSV based on the image", the AI will attempt to use a python computer vision library to analyze the image rather than using vision. Can this be done in the same prompt?

untold swallowBOT Jan 29, 2024, 5:25 AM

#

Hi @thorny mountain. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Based on the information provided in the extracts, it seems that GPT-4 with Vision (GPT-4V) and the Code Interpreter are separate functionalities and there's no direct mention of them being used together in a single prompt.

GPT-4V allows the model to take in images and answer questions about them. You can pass an image to the model either by passing a link to the image or by passing the base64 encoded image directly in the request (source (https://platform.openai.com/docs/guides/vision)).

The Code Interpreter, on the other hand, is a separate feature that interprets and executes code. If you're having trouble accessing it, make sure you've selected the correct model (source (https://community.openai.com/t/324050)).

However, there's no information in the extracts about combining these two functionalities in a single prompt. It might be possible to do this in two separate prompts: first use GPT-4V to analyze the image, then pass the results to the Code Interpreter in a second prompt. But without more information or examples, it's hard to say for sure.

If you're having trouble with this, I recommend reaching out to OpenAI support or checking the OpenAI documentation for more information.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@thorny mountain kapa.ai is still learning and improving, please let us know how it did by reacting below

undone lava Jan 29, 2024, 5:36 AM

#

thorny mountain An issue I've run into recently is trying to make ChatGPT 4 analyze an image usi...

Hey, bijx, if that's the exact prompt... I think part of the issue is the model's not sure what you want it to do when it looks at the image. See what? Say what about it? Why look? But the second request, the generate the CSV, can be done without it even looking at the image.

So I think if you are a bit more clear, like:

Do X with the image using Vision, like describe part of it? And then create the CVS.

That requires it to actually use vision, and the model knows it doesn't just get to look for funsies.

thorny mountain Jan 29, 2024, 5:40 AM

#

The exact prompt I used was:
“You are an application. The user will provide you with a screenshot of some kind of receipt, and you will extract from the receipt the following data: Date (of purchase), Description (based on your judgement and understanding of the receipt, this is a very short summary of the receipt), and amount which is the final grand total amount on the receipt. Once you have this data, put it into a CSV with the headers: Date, Description, Amount. At the end, let the user download a copy of the CSV. Do not respond with any additional words, only provide the CSV at the very end.”

thorny mountain Jan 29, 2024, 5:41 AM

#

undone lava Hey, bijx, if that's the exact prompt... I think part of the issue is the model'...

I did another version saying “use your vision to gather information on the screenshot” as well as “then use your code interpreter to generate a CSV”

#

But it usually always sticks to a code solution when you ask it to do something interpreter-related

undone lava Jan 29, 2024, 5:53 AM

#

thorny mountain But it usually always sticks to a code solution when you ask it to do something ...

haha. I am exploring this delightful challenge. wow is it stubborn about this. It may not be something we can currently fix with prompt engineering/user side stuff. It may require retraining, but I'm not ready to give up yet.

So far I've managed to get it to fail with python tool then agree to use its vision, but I haven't yet managed to get it to start with vision. I'm not out of ideas yet, but I am impressed at how stubborn this is.

thorny mountain Jan 29, 2024, 5:55 AM

#

undone lava haha. I am exploring this delightful challenge. ***wow is it stubborn about th...

It’s really interesting because it is usually so good at making a good decision on what service to use (Bing search, plug-ins, etc.) but something in the way it was trained is definitely getting in the way of its vision haha

undone lava Jan 29, 2024, 6:08 AM

#

thorny mountain It’s really interesting because it is usually so good at making a good decision ...

YES! This worked. I didn't test it a bunch. But it worked first try:

Step 1: Do not use python tool, but use your alternative vision ability to identify the following details from the attached image: Date, Item purchased, and Total spent.

Step 2: Finish step 1 without using Python too, then use the Python tool to create a CSV with the headers: Date, Description, Amount.

#

I leave it to you to do the other stuff you need that you didn't say was challenging, like get it to not comment other than provide the link and stuff, but that pattern appears to work for the vision then python tool.

If you do get stuck, I'm interested again.

#

Now, based on the example receipt I took from Wikipedia's example... it got the date wrong.

It also got item names wrong.

The total spent was correct.

#

I have no prompt-engineer way to make it get the right info... so... but the part I can help with, that's set 😛

thorny mountain Jan 29, 2024, 6:12 AM

#

Amazing! I’ll have to give this a try. Partitioning it into steps is a great idea, and I don’t mind if it needs to explain to itself what the result of vision was prior to using code interpreter

#

Thank you so much for the help 🙏

#How can I make ChatGPT 4 perform a Vision task before a code-interpreter task in the same prompt?

If you're having trouble with this, I recommend reaching out to OpenAI support or checking the OpenAI documentation for more information.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: