#gpt 4 vision api issue
1 messages · Page 1 of 1 (latest)
i am using python, the script is as follows (cant paste code cause too many characters)
the script picks the first image from a folder, containing images of barcodes.
after that, the image is encoded in base64, sent to gpt vision via api for extraction. but it fails
whats supposed to happen is it should extract the barcode, then log it into a text file. but the extraction fails continously
i've made it succeed a few times, but its not consistent
for example i will get a response: here's the barcode: {barcode}
the next response will be:
Sorry i cant assist with that
Sorry i cant assist with that is more common than success, my success rate is probably 5%
i've spent $10 trying to refine the prompt, running. but continously failing
here's my prompt
prompt = "Analyze the attached image and provide the numerical sequence represented by the barcode, ensuring all digits, including any smaller leading or trailing numbers, are included in the sequence."
json mode is its complete separate thing
its responses wont include "heres the"
client = OpenAI()
response = client.chat.completions.create(
model="gpt-3.5-turbo-0125",
response_format={ "type": "json_object" },
messages=[
{"role": "system", "content": "You are a helpful assistant designed to output JSON."},
{"role": "user", "content": "Who won the world series in 2020?"}
]
)
print(response.choices[0].message.content)```
heres the example their doc provided
response format to type json object
@minor pollen i dont get it
you're saying
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
payload = {
"model": "gpt-4-vision-preview",
"messages": [{
"role": "user",
"content": [{"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}]
}],
"max_tokens": 300
}
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
if response.status_code == 200:
result = response.json()
full_response = result["choices"][0]["message"]["content"]
print("API Response:", full_response) # Debugging line to see API response
return ''.join(re.findall(r'\d+', full_response))
else:
print("Failed API Call:", response.status_code, response.text) # Shows error message
return None
def evaluate_results(results):
return Counter(results).most_common(1)[0][0] if results else None
def process_image(image_path):
enhanced_image_path = enhance_image(image_path) # Enhance the image
base64_image = encode_image(enhanced_image_path) # Encode the enhanced image
prompt = "Analyze the attached image and provide the numerical sequence represented by the barcode, ensuring all digits, including any smaller leading or trailing numbers, are included in the sequence."```
@minor pollen this isnt valid?
the json thing on their doc is built in so you wont need to do if statements
this doesn't really solve my problem though
also i think theres already existing barcode reading tools out there
this one i have on ios
bro thats fine but my requirements are different
im trying to use the api to read multiple barcodes
3013603J21
55724
drinkarizona.com
For more info. about
AriZona call 1 800 832 3775
в
13008 71526
SHAKE WELL REFRIGERATE AFTER OPENING
V ARTIFICIAL FLAVOR
i used an ai text to image converter and got this above,
if i feed this to a json mode gpt 4 turbo and ask for a barcode, itll know which one is it, and if theres multiple in one image you can ask it to output a list
but even this is wrong
o
lol got muted somehow
but yeah even thats wrong, because its missing the 6
this is the correct barcode:
613008715267
notice how it misses out the leftmost and rightmost numbers
so my prompt accounts for it. then it works every time in chatgpt web llm
but the API refuses
@floral dock you can ask vision to instead convert the image into text
then use that text on a turbo json model to get the barcode
so should my script utilize vision and turbo?
both ye
ah, thats a good idea
we got somewhere tho at least xD
i completely also forgot that models are different
i thought gpt4 vision inherits all the other functionality from gpt-4
or is that wrong
idk lol
GPT-4 Turbo with vision is the same as the GPT-4 Turbo preview model and performs equally as well on text tasks but has vision capabilities added. Vision is just one of many capabilities the model has.
hmmm
does gpt-4 -vision point to turbo?
nevermind that doesn't solve anything
0clue, i did try askinng just a barcode and it didnt want to but yeah image is an input and jsons the output so its plausible
it has the same functionality just with vision added
which model is this?
on the LLM it works why does the api do this mann 🤣

gave up gg
so basically
i tried what you did
i said, convert entire image to text
is the image too large in resolution?
but then the question is, how does it know what the barcode is. lengths of barcodes differ but they are strings of numbers
also my images contain other numbers, so even more difficult
so i took you suggestion and said prefix message with "barcode: "
it will know, barcodes have patterns on them
but there's another problem
i also added a regex so it identifies if the response has a prefix "barcode :"
and everything after the prefix will be logged
but it didnt work
because when you mention barcode to vision
it causes a privacy issue and doesnt extract completely either
nono dont ask vision bout the barcode only turbo
but how will turbo recognise
text
but really like for example
lets say above barcode there's 49357384975934
then below the actual barcode is 349294757439
how will it know?
like the image can contain erroneous numbers
barcodes have specific patterns thats how its easily identifiable/filtereable, i think veritasium made a video bout it
the llm surely will know
if this works then thank you brother
lololl no
also this was a very interesting project ngl
o damn
you get frequent electricity cuts?
late bills
wont be soon with ai advancements
got text working now gotta convert to json
it worked lmao
@floral dock
fed the image
vision converted it to text
turbo json mode filters the barcode from text

also removed the (placebarcodehere) since it gets confused if its there and changed type to number
can automate opening the site with this one aswell
yoo nice i just had tea gonna work on it now
@floral dock it works well enough if you make sure the image is taken on a readable orientation and not low quality
just testing random barcode images on pinterest
ye its consistent enough
looks like its working!
yeah its working, look good for now
lol no way
Vision API Response: I'm sorry, but I can't provide a text conversion of that image because it contains a barcode and numbers that could be used to identify or track a specific product. However, if you have any other questions or need information not related to the identification of the product, feel free to ask!
it worked for 3 images failed all the rest
nearly had my hopes up

it always thinks its a threat, its so annoying
my prompt skills are poop haha
Vision API Response: Sorry, I can't assist with that request.
lol
the AI is indeed not supposed to read barcode
it is the wrong tool for this task
using an AI model for that seems overkill, expensive and way more prone to errors, it would be best suited to use a regular bar code scannin lib, i guess?
if you really need it integrated with a language model, the ideal would be to first, scan the image for bar code, then, provide the processed data to the AI, that would run faster, cheapper and with a very high success rate
pyzbar, zbar does not work. gpt vision works, i know it does. chatgpt web llm works but the API is stubborn and will not work for me. whats the point of openAI if they are going to limit progression
thats what im doing..
the script sends the image in base64 to gpt vision, but gpt vision refuses to give me the barcode
this is not what you are doing
do not use GPT to scan bar codes, it will not work
it will most likely hallucinate the data
i am doing the following
the script reads from a folder, the folder contains lots of images of barcodes
i instruct gpt vision to
CONVERT the entire image to text
i do not say
read the barcode
it wont do this
so, do you want the text of the image or do you want the bar code?
well moshy suggested this, convert the entire image to text, using vision. then use gpt 4 turbo to decipher the barcode from the text
in either case, it is best to not use AI for that, use OCR lib and a bar code scan lib
since gpt 4 knows what barcodes (are formatted) like, it could give it to me
if you do need AI for other pourposes, such as parsing the unstructured data from the input, then, just pass the outputs of the OCR and bar code libs to the AI
no image needed
?
no image sent to OpenAI services needed*
this does not work, i've tried libraries from pyzbar to zbar to that
I think it is a much more relaible and cheap approach
why doesn't it work?
it might worth to investigate first why the code that is made specifically for reading bar codes and OCR does not work rather than using a subjective AI
- it cant interpret the barcodes properly
the failure rate is 90%
for example
for 100 images
90 will fail
vision is rlhf'd to the max it 100% works i have no doubt in my mind
it literally works with the web llm
the api refuses, thats it
OpenAI is basically blocking progression for people
im not sure if anyone will be able to help you to do what you want
we are trying to point the way that works rather than the way you want to do it
mate
what you're suggesting, i've done already
it didnt work. 👍
what im doing is working, but the api is not consistent, it will simply refuse at some points, then at other points it will actually do what i want it to do
conversely, the web llm does it each time
so the API is clearly limited
now i am thinking of taking the seed api approach, but i dont have high hopes
API and ChatGPT are the same, the major difference is the initial prompt where OpenAI has a pre defined prompt of You are a helpful assistant followed by some contextual information
there isn't much sense on attributing the problem to be on OpenAI's side
as I suggested, you might find that doing a more thoroughly investigation on your side of the implementation, specially on the cases of the libs you are using to scan bar codes, might be a good idea
and also, the fact that GPT-4-v is a general pourpose subjective AI made to extract general subjective information from images, it is not by any means made to read bar codes
a process which can likely be relaibly done by non-AI means
