Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Jokes aside, isn't it wrong that OCR software still always produces textual result from images wich are not text? More than a decade ago I OCRed an old book, and I remember how annoying it was to deal with all the garbage text produced from small pictures, smudges, and dirt. It looks like there's not much progress done since in the field



That question seems to be the same kind as the question in the OP. Isn't something wrong when a random scribble creates a valid execution in Perl?


> It looks like there's not much progress done since in the field

LLMs help here. From my own experiments chatGPT is pretty good "smart, context-aware" OCR agent.


Using image embedding and evaluating 100s billion parameter LLM for OCR is like hunting rabbits using Yamato’s 18in naval gun.


Well using a human is bring an interstellar rail gun to hunt rabbit so i guess it still better ?


Not really. Proper OCR in the broadest sense (extracting text from arbitrary pdfs that intermingle tables, images, etc, or from hand written artistic posters) requires a full understanding of semantic intent.

You are perhaps imagining more constrained scenarios of straight lines of consistent text on a page with well-known artifacts of "noise" (smudges, print imperfections, and so on).


Yes, there has been progress. But the featured article is meant to be fun!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: