Just tested with a multilingual (bidi) English/Hebrew document.
The Hebrew output had no correspondence to the text whatsoever (in context, there was an English translation, and the Hebrew produced was a back-translation of that).
Their benchmark results are impressive, don't get me wrong. But I'm a little disappointed. I often read multilingual document scans in the humanities. Multilingual (and esp. bidi) OCR is challenging, and I'm always looking for a better solution for a side-project I'm working on (fixpdfs.com).
Also, I thought OCR implied that you could get bounding boxes for text (and reconstruct a text layer on a scan, for example). Am I wrong, or is this term just overloaded, now?
Mathpix is ace. That’s the best results I got so far for scientific papers and reports. It understands the layout of complex documents very well, it’s quite impressive. Equations are perfect, figures extraction works well.
There are a few annoying issues, but overall I am very happy with it.
I had a billing issue at the beginning. It was resolved very nicely but I try to be careful and I monitor the bill a bit more than I would like.
Actually my main remaining technical issue is conversion to standard Markdown for use in a data processing pipeline that has issues with the Mathpix dialect. Ideally I’d do it on a computer that is airgaped for security reasons. But I haven’t found a very good way of doing it because the Python library wanted to check my API key.
A problem I have and that is not really Mathpix’s fault is that I don’t really know how to store the figures pictures to keep them with the text in a convenient way. I haven’t found a very satisfying strategy.
The Hebrew output had no correspondence to the text whatsoever (in context, there was an English translation, and the Hebrew produced was a back-translation of that).
Their benchmark results are impressive, don't get me wrong. But I'm a little disappointed. I often read multilingual document scans in the humanities. Multilingual (and esp. bidi) OCR is challenging, and I'm always looking for a better solution for a side-project I'm working on (fixpdfs.com).
Also, I thought OCR implied that you could get bounding boxes for text (and reconstruct a text layer on a scan, for example). Am I wrong, or is this term just overloaded, now?