Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just tested with a multilingual (bidi) English/Hebrew document.

The Hebrew output had no correspondence to the text whatsoever (in context, there was an English translation, and the Hebrew produced was a back-translation of that).

Their benchmark results are impressive, don't get me wrong. But I'm a little disappointed. I often read multilingual document scans in the humanities. Multilingual (and esp. bidi) OCR is challenging, and I'm always looking for a better solution for a side-project I'm working on (fixpdfs.com).

Also, I thought OCR implied that you could get bounding boxes for text (and reconstruct a text layer on a scan, for example). Am I wrong, or is this term just overloaded, now?




You can get bounding boxes from our pdf api at Mathpix.com

Disclaimer, I’m the founder


Mathpix is ace. That’s the best results I got so far for scientific papers and reports. It understands the layout of complex documents very well, it’s quite impressive. Equations are perfect, figures extraction works well.

There are a few annoying issues, but overall I am very happy with it.


Thanks for the kind words. What are some of the annoying issues?


I had a billing issue at the beginning. It was resolved very nicely but I try to be careful and I monitor the bill a bit more than I would like.

Actually my main remaining technical issue is conversion to standard Markdown for use in a data processing pipeline that has issues with the Mathpix dialect. Ideally I’d do it on a computer that is airgaped for security reasons. But I haven’t found a very good way of doing it because the Python library wanted to check my API key.

A problem I have and that is not really Mathpix’s fault is that I don’t really know how to store the figures pictures to keep them with the text in a convenient way. I haven’t found a very satisfying strategy.

Anyway, keep up the good work!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: