#GPT 4 Vision availability
1 messages · Page 1 of 1 (latest)
can you post a screenshot of where you saw this?
platform.openai.com support bot
Vision docs say and I quote, "Note that the Assistants API does not currently support image inputs." What's the ETA on this feature?
Oh right the assistants API doesn't support image inputs, no known ETA currently
but you can always just program a call to gpt-4-vision-preview if you need image understanding for the assistant as a custom tool
This chat bot data must be outdated
image access is on ChatGPT for Plus users and on the API through gpt-4-vision-preview for paying API customers
My use case involves understanding an image and extracting out specific text based on that, like know what text is a name. I've found most of the time gpt 4 vision gives me "I can't respond to the request" something like that. And it can't read handwritten text even if neatly written. Had high hopes for that
It sounds like you wanted OCR from GPT vision.
Although you get I can not response to the request sometimes, you can try to improve that by updating prompt.
You should just use an OCR library
doesn't really make sense to use GPT-4 in that case
OCR libraries will work better too
GPT is able to do OCR you just need to play with the prompt a bit but it's not great at oct
ocr*
I already have other ways and it works fine, just wanted to experiment with what gpt can do
The docs has a list of limitations so it's already not viable, i do already use it to distinguish between text values using base gpt for things like name that can't be parsed with simple logic
Again it's about reducing the moving parts in my flow, currently i use a list of tools to get it done, hope it improves in the future and i can increase the usage
I guess but I don't see any reason why you'd prefer using GPT for OCR instead of just OCR? can you elaborate a bit? @surreal vector
For context awareness
I feel like in terms of moving parts adding another GPT layer is more moving parts, you can achieve context awareness by OCRing the text and then putting the text into the prompt for GPT afterwards
Right now i do ocr then based on specific areas where i know the value is extract data i need
It's always roughly there not exactly so coordinates don't always align
Ah i see
So i use things like regex to make sure it's right
so u need it to extract out a specific part I understand
What if you OCR all the text including irrelevant text and then get GPT to keep only the relevant parts?
that would increase accuracy
In a perfect world gpt would extract what I'm asking and based on training data extract exactly what's needed without the coordinate shenanigans
Yes that's roughly what i do using gpt 3.5 right now
It would've been cool if i could throw away everything and give vision an input and it gives me output lol
Waiting for assistants to have vision support, maybe it improves things