#Gpt 4 vision

1 messages · Page 1 of 1 (latest)

rare wedge
#

Would GPT4V be able to read all the text from an image like this?

blazing pecanBOT
#

Hi @rare wedge. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Based on the information provided in the knowledge sources, GPT-4V, or GPT-4 Turbo with Vision, is a model developed by OpenAI that combines natural language processing with image recognition. It can understand and generate responses based on both text and visual inputs. However, it's important to note that using GPT-4-vision for OCR (Optical Character Recognition) might not be the best use of the AI, especially for a nearly-solved problem like text recognition from images.

The model can take in images and answer questions about them, but it's not explicitly stated that it can read all the text from an image. In fact, a forum post on OpenAI's community suggests that the image is split into tiles if over 512 pixels in any dimension, and then a read of the main tile plus processing of the subtiles is performed. This might affect the model's ability to accurately read all the text from an image, especially if the text is spread across multiple tiles.

Therefore, while GPT-4V might be able to recognize some text in an image, it might not be able to accurately and completely read all the text from an image, especially if the image is large or the text is spread out. For OCR tasks, you might want to consider using specialized OCR tools or APIs.

Sources:


Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: