#Typress: The Typst Formula OCR Model

6 messages · Page 1 of 1 (latest)

robust osprey
#

We're excited to share Typress, a (possibly the first) open-source Typst formula OCR model based on TrOCR.
Typress is lightweight and can run smoothly on consumer-grade computers, even without GPU.

How to Use Typress

Performance

Typress performs well on medium-length formulas, but there's still room for improvement with multi-line or complex long formulas (or matrices).
The current model achieved a Character Error Rate (CER) of 0.047 on our test set.

Datasets

We used a LaTeX dataset converted to a Typst dataset using our tool, tex2typ. We also used eq_query_rec built by @true osprey to extract and normalize mathematical formulas from some users' Typst projects for our dataset. All these tools can be found in our GitHub repo.

GitHub

Typst Mathematical Expression OCR. Contribute to ParaN3xus/typress development by creating an account on GitHub.

mild kayak
#

impressive work!

true osprey
#

💀

sour abyss
#

Very impressive!

ember shuttle
#

That's sweet!

slow wing
#

This is awesome!