Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, Tesseract is barely production quality.


yeah it was SOTA in 2006, 18 years ago


Other than proprietary models, what is better than it today? Just asking in case I ever need OCR and don't want to pay the cloud providers for it :D


checkout https://github.com/mindee/doctr or https://github.com/VikParuchuri/surya for something practical

multimodal llm would of course blow it all out the water, so some llama3-like model is probably SOTA in terms of what you can run yourself. something like https://huggingface.co/blog/idefics2




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: