We're excited to share Typress, a (possibly the first) open-source Typst formula OCR model based on TrOCR.
Typress is lightweight and can run smoothly on consumer-grade computers, even without GPU.
How to Use Typress
- Try it now: Experience Typress with the online demo on our Hugging Face Space: https://huggingface.co/spaces/paran3xus/typress_ocr_space
- Deploy locally: Follow the instructions in our GitHub repository's README for local deployment, which includes a web-based frontend: https://github.com/ParaN3xus/typress
Performance
Typress performs well on medium-length formulas, but there's still room for improvement with multi-line or complex long formulas (or matrices).
The current model achieved a Character Error Rate (CER) of 0.047 on our test set.
Datasets
We used a LaTeX dataset converted to a Typst dataset using our tool, tex2typ. We also used eq_query_rec built by @true osprey to extract and normalize mathematical formulas from some users' Typst projects for our dataset. All these tools can be found in our GitHub repo.