I’m exploring ways to enable ChatGPT to interpret and analyze images that are part of uploaded documents. This would allow the AI to integrate visual information with text, enriching its responses with a more comprehensive understanding of the content.
Currently, ChatGPT seems to miss the details in images that are embedded in documents. When I use custom commands with the Code Interpreter, it still doesn’t recognize the images’ contents.
Interestingly, if I extract the document pages as separate image files and upload them, ChatGPT can analyze both text and images effectively. However, this method isn’t very practical, especially considering the upload limits in the GPT builder.
Has anyone found a more efficient solution to this challenge?