Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not really. Proper OCR in the broadest sense (extracting text from arbitrary pdfs that intermingle tables, images, etc, or from hand written artistic posters) requires a full understanding of semantic intent.

You are perhaps imagining more constrained scenarios of straight lines of consistent text on a page with well-known artifacts of "noise" (smudges, print imperfections, and so on).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: