Excited to test this our on our side as well. We recently built an OCR benchmark...

themanmaran · 2025-03-06T23:11:23 1741302683

Update: Just ran our benchmark on the Mistral model and results are.. surprisingly bad?

Mistral OCR:

- 72.2% accuracy

- $1/1000 pages

- 5.42s / page

Which is pretty far cry from the 95% accuracy they were advertising from their private benchmark. The biggest thing I noticed is how it skips anything it classifies as an image/figure. So charts, infographics, some tables, etc. all get lifted out and returned as [image](image_002). Compared to the other VLMs that are able to interpret those images into a text representation.

https://github.com/getomni-ai/benchmark

https://huggingface.co/datasets/getomni-ai/ocr-benchmark

https://getomni.ai/ocr-benchmark

jaggs · 2025-03-06T21:40:47 1741297247

By optimistic, do you mean 'tweaked'? :)