gpt 4 vision api issue | OpenAI | Page 1

floral dock Mar 31, 2024, 9:05 PM

#

@crisp hamlet

Hi,

the images are all product images with the barcode included, here is a sample:

WhatsApp_Image_2024-03-20_at_5.44.14_PM.jpeg

#

i am using python, the script is as follows (cant paste code cause too many characters)

#

the script picks the first image from a folder, containing images of barcodes.

after that, the image is encoded in base64, sent to gpt vision via api for extraction. but it fails

#

whats supposed to happen is it should extract the barcode, then log it into a text file. but the extraction fails continously

#

i've made it succeed a few times, but its not consistent

#

for example i will get a response: here's the barcode: {barcode}

the next response will be:

Sorry i cant assist with that

#

Sorry i cant assist with that is more common than success, my success rate is probably 5%

#

i've spent $10 trying to refine the prompt, running. but continously failing

#

here's my prompt

#

prompt = "Analyze the attached image and provide the numerical sequence represented by the barcode, ensuring all digits, including any smaller leading or trailing numbers, are included in the sequence."

floral dock Mar 31, 2024, 9:41 PM

#

@crisp hamlet

#

any ideas please

minor pollen Apr 1, 2024, 1:12 AM

#

floral dock for example i will get a response: here's the barcode: {barcode} the next respo...

json mode is its complete separate thing

#

its responses wont include "heres the"

#

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-3.5-turbo-0125",
  response_format={ "type": "json_object" },
  messages=[
    {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
    {"role": "user", "content": "Who won the world series in 2020?"}
  ]
)
print(response.choices[0].message.content)```

#

heres the example their doc provided

#

response format to type json object

floral dock Apr 1, 2024, 9:14 AM

#

@minor pollen i dont get it

#

you're saying

#


    headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
    payload = {
        "model": "gpt-4-vision-preview",
        "messages": [{
            "role": "user",
            "content": [{"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}]
        }],
        "max_tokens": 300
    }
    
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    if response.status_code == 200:
        result = response.json()
        full_response = result["choices"][0]["message"]["content"]
        print("API Response:", full_response)  # Debugging line to see API response
        return ''.join(re.findall(r'\d+', full_response))
    else:
        print("Failed API Call:", response.status_code, response.text)  # Shows error message
        return None

def evaluate_results(results):

    return Counter(results).most_common(1)[0][0] if results else None

def process_image(image_path):

    enhanced_image_path = enhance_image(image_path)  # Enhance the image
    base64_image = encode_image(enhanced_image_path)  # Encode the enhanced image
    
    prompt = "Analyze the attached image and provide the numerical sequence represented by the barcode, ensuring all digits, including any smaller leading or trailing numbers, are included in the sequence."```

#

@minor pollen this isnt valid?

minor pollen Apr 1, 2024, 9:16 AM

#

the json thing on their doc is built in so you wont need to do if statements

floral dock Apr 1, 2024, 9:21 AM

#

this doesn't really solve my problem though

minor pollen Apr 1, 2024, 9:25 AM

#

also i think theres already existing barcode reading tools out there

#

#

this one i have on ios

floral dock Apr 1, 2024, 9:27 AM

#

minor pollen

bro thats fine but my requirements are different

#

im trying to use the api to read multiple barcodes

minor pollen Apr 1, 2024, 9:33 AM

#

3013603J21
55724
drinkarizona.com
For more info. about
AriZona call 1 800 832 3775
в
13008 71526
SHAKE WELL REFRIGERATE AFTER OPENING
V ARTIFICIAL FLAVOR

i used an ai text to image converter and got this above,

if i feed this to a json mode gpt 4 turbo and ask for a barcode, itll know which one is it, and if theres multiple in one image you can ask it to output a list

floral dock Apr 1, 2024, 9:34 AM

#

but even this is wrong

minor pollen Apr 1, 2024, 9:34 AM

#

o

floral dock Apr 1, 2024, 9:35 AM

#

lol got muted somehow

#

but yeah even thats wrong, because its missing the 6

#

this is the correct barcode:

613008715267

#

notice how it misses out the leftmost and rightmost numbers

#

so my prompt accounts for it. then it works every time in chatgpt web llm

#

but the API refuses

minor pollen Apr 1, 2024, 9:39 AM

#

@floral dock you can ask vision to instead convert the image into text

#

then use that text on a turbo json model to get the barcode

#

floral dock Apr 1, 2024, 9:40 AM

#

so should my script utilize vision and turbo?

minor pollen Apr 1, 2024, 9:40 AM

#

both ye

floral dock Apr 1, 2024, 9:40 AM

#

ah, thats a good idea

minor pollen Apr 1, 2024, 9:40 AM

#

i forgot vision dont have json mode

#

lol

floral dock Apr 1, 2024, 9:40 AM

#

ohhh

#

my god 💀

minor pollen Apr 1, 2024, 9:40 AM

#

we got somewhere tho at least xD

floral dock Apr 1, 2024, 9:40 AM

#

i completely also forgot that models are different

#

i thought gpt4 vision inherits all the other functionality from gpt-4

#

or is that wrong

minor pollen Apr 1, 2024, 9:41 AM

#

idk lol

floral dock Apr 1, 2024, 9:41 AM

#

GPT-4 Turbo with vision is the same as the GPT-4 Turbo preview model and performs equally as well on text tasks but has vision capabilities added. Vision is just one of many capabilities the model has.

#

hmmm

#

does gpt-4 -vision point to turbo?

#

nevermind that doesn't solve anything

minor pollen Apr 1, 2024, 9:43 AM

#

0clue, i did try askinng just a barcode and it didnt want to but yeah image is an input and jsons the output so its plausible

floral dock Apr 1, 2024, 9:43 AM

#

it has the same functionality just with vision added

floral dock Apr 1, 2024, 9:43 AM

#

minor pollen

which model is this?

minor pollen Apr 1, 2024, 9:43 AM

#

vision

#

not turbo

floral dock Apr 1, 2024, 9:44 AM

#

on the LLM it works why does the api do this mann 🤣

minor pollen Apr 1, 2024, 9:44 AM

#

yui_shrug

floral dock Apr 1, 2024, 10:04 AM

#

gave up gg

minor pollen Apr 1, 2024, 10:05 AM

#

whyy

#

it seems easy enough you alrdy got the text working lol

floral dock Apr 1, 2024, 10:05 AM

#

minor pollen it seems easy enough you alrdy got the text working lol

so basically

#

i tried what you did

#

i said, convert entire image to text

minor pollen Apr 1, 2024, 10:06 AM

#

is the image too large in resolution?

floral dock Apr 1, 2024, 10:06 AM

#

but then the question is, how does it know what the barcode is. lengths of barcodes differ but they are strings of numbers

#

also my images contain other numbers, so even more difficult

#

so i took you suggestion and said prefix message with "barcode: "

minor pollen Apr 1, 2024, 10:07 AM

#

it will know, barcodes have patterns on them

floral dock Apr 1, 2024, 10:07 AM

#

but there's another problem

#

i also added a regex so it identifies if the response has a prefix "barcode :"

#

and everything after the prefix will be logged

#

but it didnt work

#

because when you mention barcode to vision

#

it causes a privacy issue and doesnt extract completely either

minor pollen Apr 1, 2024, 10:08 AM

#

nono dont ask vision bout the barcode only turbo

floral dock Apr 1, 2024, 10:08 AM

#

but how will turbo recognise

minor pollen Apr 1, 2024, 10:08 AM

#

text

floral dock Apr 1, 2024, 10:08 AM

#

but really like for example

#

lets say above barcode there's 49357384975934

then below the actual barcode is 349294757439

how will it know?

#

like the image can contain erroneous numbers

minor pollen Apr 1, 2024, 10:09 AM

#

barcodes have specific patterns thats how its easily identifiable/filtereable, i think veritasium made a video bout it

#

the llm surely will know

floral dock Apr 1, 2024, 10:10 AM

#

hmmm

#

so maybe we bet on the RLHF

#

its seen enough to know

minor pollen Apr 1, 2024, 10:10 AM

#

yee

#

barcodes arent just random generated numbers

floral dock Apr 1, 2024, 10:10 AM

#

if this works then thank you brother

minor pollen Apr 1, 2024, 10:10 AM

#

lololl no

floral dock Apr 1, 2024, 10:10 AM

#

also this was a very interesting project ngl

minor pollen Apr 1, 2024, 10:10 AM

#

np

#

ill try it aswell

#

in a bit i just got electricity back

floral dock Apr 1, 2024, 10:11 AM

#

minor pollen in a bit i just got electricity back

o damn

#

you get frequent electricity cuts?

minor pollen Apr 1, 2024, 10:11 AM

#

~~late bills~~

floral dock Apr 1, 2024, 10:11 AM

#

ohhhh

#

lool

#

economy hard ):

#

things getting too expensive fr

minor pollen Apr 1, 2024, 10:12 AM

#

wont be soon with ai advancements

#

got text working now gotta convert to json

#

#

it worked lmao

#

@floral dock

#

fed the image
vision converted it to text
turbo json mode filters the barcode from text

#

#

AA_Tsukihi_Dance

#

also removed the (placebarcodehere) since it gets confused if its there and changed type to number

#

can automate opening the site with this one aswell

floral dock Apr 1, 2024, 10:35 AM

#

minor pollen

yoo nice i just had tea gonna work on it now

minor pollen Apr 1, 2024, 10:50 AM

#

@floral dock it works well enough if you make sure the image is taken on a readable orientation and not low quality

#

#

just testing random barcode images on pinterest

#

ye its consistent enough

floral dock Apr 1, 2024, 11:02 AM

#

looks like its working!

floral dock Apr 1, 2024, 11:02 AM

#

minor pollen <@360886380630376468> it works well enough if you make sure the image is taken o...

yeah its working, look good for now

minor pollen Apr 1, 2024, 11:12 AM

#

gj nicee

#

AA_Taiga_Clap

floral dock Apr 1, 2024, 12:38 PM

#

minor pollen gj nicee

lol no way

#

Vision API Response: I'm sorry, but I can't provide a text conversion of that image because it contains a barcode and numbers that could be used to identify or track a specific product. However, if you have any other questions or need information not related to the identification of the product, feel free to ask!

floral dock Apr 1, 2024, 12:39 PM

#

floral dock yeah its working, look good for now

it worked for 3 images failed all the rest

#

nearly had my hopes up

minor pollen Apr 1, 2024, 12:39 PM

#

AquaCry

floral dock Apr 1, 2024, 12:39 PM

#

it always thinks its a threat, its so annoying

minor pollen Apr 1, 2024, 12:40 PM

#

some prompt engineering on getting the text probably will help

#

tricking it lol

floral dock Apr 1, 2024, 12:40 PM

#

my prompt skills are poop haha

#

Vision API Response: Sorry, I can't assist with that request.

#

lol

modest wren Apr 1, 2024, 7:46 PM

#

the AI is indeed not supposed to read barcode

#

it is the wrong tool for this task

#

using an AI model for that seems overkill, expensive and way more prone to errors, it would be best suited to use a regular bar code scannin lib, i guess?

#

if you really need it integrated with a language model, the ideal would be to first, scan the image for bar code, then, provide the processed data to the AI, that would run faster, cheapper and with a very high success rate

floral dock Apr 2, 2024, 3:14 PM

#

modest wren using an AI model for that seems overkill, expensive and way more prone to error...

pyzbar, zbar does not work. gpt vision works, i know it does. chatgpt web llm works but the API is stubborn and will not work for me. whats the point of openAI if they are going to limit progression

floral dock Apr 2, 2024, 3:14 PM

#

modest wren if you really need it integrated with a language model, the ideal would be to fi...

thats what im doing..

#

the script sends the image in base64 to gpt vision, but gpt vision refuses to give me the barcode

modest wren Apr 2, 2024, 3:18 PM

#

floral dock thats what im doing..

this is not what you are doing

#

do not use GPT to scan bar codes, it will not work

#

it will most likely hallucinate the data

floral dock Apr 2, 2024, 3:19 PM

#

i am doing the following

#

the script reads from a folder, the folder contains lots of images of barcodes

#

i instruct gpt vision to

#

CONVERT the entire image to text

#

i do not say

#

read the barcode

floral dock Apr 2, 2024, 3:20 PM

#

floral dock CONVERT the entire image to text

it wont do this

modest wren Apr 2, 2024, 3:20 PM

#

so, do you want the text of the image or do you want the bar code?

floral dock Apr 2, 2024, 3:21 PM

#

well moshy suggested this, convert the entire image to text, using vision. then use gpt 4 turbo to decipher the barcode from the text

modest wren Apr 2, 2024, 3:21 PM

#

in either case, it is best to not use AI for that, use OCR lib and a bar code scan lib

floral dock Apr 2, 2024, 3:21 PM

#

since gpt 4 knows what barcodes (are formatted) like, it could give it to me

modest wren Apr 2, 2024, 3:22 PM

#

if you do need AI for other pourposes, such as parsing the unstructured data from the input, then, just pass the outputs of the OCR and bar code libs to the AI

#

no image needed

floral dock Apr 2, 2024, 3:22 PM

#

modest wren no image needed

?

modest wren Apr 2, 2024, 3:22 PM

#

no image sent to OpenAI services needed*

floral dock Apr 2, 2024, 3:23 PM

#

modest wren in either case, it is best to not use AI for that, use OCR lib and a bar code sc...

this does not work, i've tried libraries from pyzbar to zbar to that

modest wren Apr 2, 2024, 3:23 PM

#

I think it is a much more relaible and cheap approach

modest wren Apr 2, 2024, 3:23 PM

#

floral dock this does not work, i've tried libraries from pyzbar to zbar to that

why doesn't it work?

#

it might worth to investigate first why the code that is made specifically for reading bar codes and OCR does not work rather than using a subjective AI

floral dock Apr 2, 2024, 3:24 PM

#

it cant interpret the barcodes properly

#

the failure rate is 90%

#

for example

#

for 100 images

#

90 will fail

floral dock Apr 2, 2024, 3:24 PM

#

modest wren it might worth to investigate first why the code that is made specifically for r...

vision is rlhf'd to the max it 100% works i have no doubt in my mind

#

it literally works with the web llm

#

the api refuses, thats it

#

OpenAI is basically blocking progression for people

modest wren Apr 2, 2024, 3:26 PM

#

im not sure if anyone will be able to help you to do what you want

#

we are trying to point the way that works rather than the way you want to do it

floral dock Apr 2, 2024, 3:26 PM

#

mate

#

what you're suggesting, i've done already

#

it didnt work. 👍

#

what im doing is working, but the api is not consistent, it will simply refuse at some points, then at other points it will actually do what i want it to do

#

conversely, the web llm does it each time

#

so the API is clearly limited

#

now i am thinking of taking the seed api approach, but i dont have high hopes

modest wren Apr 2, 2024, 3:29 PM

#

API and ChatGPT are the same, the major difference is the initial prompt where OpenAI has a pre defined prompt of You are a helpful assistant followed by some contextual information

#

there isn't much sense on attributing the problem to be on OpenAI's side

#

as I suggested, you might find that doing a more thoroughly investigation on your side of the implementation, specially on the cases of the libs you are using to scan bar codes, might be a good idea

#

and also, the fact that GPT-4-v is a general pourpose subjective AI made to extract general subjective information from images, it is not by any means made to read bar codes

#

a process which can likely be relaibly done by non-AI means

#gpt 4 vision api issue