#Iterative process over scanned image (API only)

21 messages ยท Page 1 of 1 (latest)

subtle oak
#

Hello,

I have two issues that might be correlated, when the PDF is a scan (and therefore an image).
When sending a pdf to the OCR model, it provides me with a image, then I run the OCR on it, and it returns me a bit of text and other image, on top of which I can rerun OCR again, and so on and so forth. Not very confortable to work with.

Now with this same PDF, I query the chat, I get a correct reading of the whole doc, but when I do the same query with the API, i only have the first layer of information that is being read, and the answer is incomplete.

It seems there is a divergence of treatment between chat and API which lead to poor API performances.

#

Iterative process over scanned image (API only)

undone flame
#

Hi, could you share the image?

subtle oak
#

in private yes

subtle oak
#

@oblique sparrow

oblique sparrow
#

actually i got the same problem , heres an image that cannot be OCRd correctly , it only returns the markdown as the cropped image to the content . if i iterate again throught the cropped result output then it shows the correct markdown . the problem is that we cannot iterating over and over again the same image .

#

here's the code to reproduce :

subtle oak
#

good usecase, I should do the same with my syndic ๐Ÿ˜„

#

Note that if you send this image to pixtral through API, and ask things about it, it must probably go through an initial OCR process and miss a lot of content due to this issue.

subtle oak
#

btw I'm not sure you need to crop, and when calling the OCR the second time you can simply do this
document=ImageURLChunk(image_url=subimg.image_base64),

torpid fox
#

Recognition doesn't work at all with a page of an accounting document ๐Ÿ˜•

oblique sparrow
subtle oak
#

so are we going to have a ticket ID or anything to track this issue ?

subtle oak
#

@undone flame for the moment the OCR is quite unusable on scanned documents, do you have clearly identified the issue and have a target date for the fix ?

undone flame
#

We are working on it! In the meantime, note that removing the white borders improves it, and the OCR model is also designed to be coupled with a multimodal retrieval system and models, so its not unexpected for it to output images, if you desire text output only or a structured output, I would use Mistral OCR+Pixtral 12B together as one

elder schooner
#

@undone flame can you please elaborate how to use "Mistral OCR+Pixtral 12B together as one" ? Maybe there are code examples for that?

undone flame
#

As an example