Iterative process over scanned image (API only) | Mistral AI | Page 1

subtle oak Mar 7, 2025, 6:33 PM

#

Hello,

I have two issues that might be correlated, when the PDF is a scan (and therefore an image).
When sending a pdf to the OCR model, it provides me with a image, then I run the OCR on it, and it returns me a bit of text and other image, on top of which I can rerun OCR again, and so on and so forth. Not very confortable to work with.

Now with this same PDF, I query the chat, I get a correct reading of the whole doc, but when I do the same query with the API, i only have the first layer of information that is being read, and the answer is incomplete.

It seems there is a divergence of treatment between chat and API which lead to poor API performances.

#

Iterative process over scanned image (API only)

undone flame Mar 7, 2025, 7:15 PM

#

Hi, could you share the image?

subtle oak Mar 7, 2025, 9:15 PM

#

in private yes

subtle oak Mar 8, 2025, 7:29 PM

#

@oblique sparrow

oblique sparrow Mar 8, 2025, 7:40 PM

#

actually i got the same problem , heres an image that cannot be OCRd correctly , it only returns the markdown as the cropped image to the content . if i iterate again throught the cropped result output then it shows the correct markdown . the problem is that we cannot iterating over and over again the same image .

#

here's the code to reproduce :

#

here's the code to reproduce :

📎 testing_code.py

subtle oak Mar 9, 2025, 2:35 PM

#

good usecase, I should do the same with my syndic 😄

#

Note that if you send this image to pixtral through API, and ask things about it, it must probably go through an initial OCR process and miss a lot of content due to this issue.

subtle oak Mar 9, 2025, 2:59 PM

#

btw I'm not sure you need to crop, and when calling the OCR the second time you can simply do this
document=ImageURLChunk(image_url=subimg.image_base64),

torpid fox Mar 9, 2025, 8:28 PM

#

Recognition doesn't work at all with a page of an accounting document 😕

📎 GL1.pdf

oblique sparrow Mar 10, 2025, 1:57 AM

#

subtle oak btw I'm not sure you need to crop, and when calling the OCR the second time you ...

Yep agree , but I need to explicitly mention in the first iteration to return the base64 data of the images clipped

subtle oak Mar 14, 2025, 8:26 PM

#

so are we going to have a ticket ID or anything to track this issue ?

subtle oak Mar 17, 2025, 2:07 PM

#

@undone flame for the moment the OCR is quite unusable on scanned documents, do you have clearly identified the issue and have a target date for the fix ?

undone flame Mar 17, 2025, 2:18 PM

#

We are working on it! In the meantime, note that removing the white borders improves it, and the OCR model is also designed to be coupled with a multimodal retrieval system and models, so its not unexpected for it to output images, if you desire text output only or a structured output, I would use Mistral OCR+Pixtral 12B together as one

elder schooner Mar 17, 2025, 5:12 PM

#

@undone flame can you please elaborate how to use "Mistral OCR+Pixtral 12B together as one" ? Maybe there are code examples for that?

undone flame Mar 17, 2025, 5:24 PM

#

elder schooner <@1091626750829678602> can you please elaborate how to use "Mistral OCR+Pixtral ...

You can visit the structured ocr output cookbook on the docs (bottom)

#

As an example

subtle oak Mar 17, 2025, 8:30 PM

#

elder schooner <@1091626750829678602> can you please elaborate how to use "Mistral OCR+Pixtral ...

https://github.com/mistralai/cookbook/blob/main/mistral/ocr/structured_ocr.ipynb

#

All Together - Mistral OCR + Custom Structured Output

#Iterative process over scanned image (API only)