Dear Mistral Team,
I am a teacher from Germany and I am currently exploring Mistral OCR for digitizing student assignments. While the performance is impressive, I have encountered a specific challenge for the educational sector.
The Problem: "Auto-Correction" of Errors
Mistral OCR often performs so well that it "cleans up" the text. For teachers, it is essential to have an absolutely faithful transcription. If a student makes a spelling or grammatical mistake, the OCR must reflect that exactly in the Markdown output. Currently, the model sometimes "fixes" these errors, which makes it impossible to use the digital version for grading or diagnostic purposes.
My Feature Request:
- High-Fidelity Mode: A setting or parameter that forces the model to prioritize literal character recognition over linguistic probability (avoiding "autocorrect").
- Handling of Non-linear Layouts: Better support for handwriting and margin notes, as students often use arrows or inserts that disrupt the logical flow.
- Contextual Awareness without Modification: Using models like Pixtral to understand the layout, but without altering the original text's orthography.
Is a "Fidelity Mode" or a specialized HTR (Handwritten Text Recognition) focus on your roadmap? Providing a GDPR-compliant, high-fidelity OCR solution for schools in the EU would be a massive benefit for the educational landscape.
I look forward to your thoughts on this.
Best regards,
Felix