Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Using image embedding and evaluating 100s billion parameter LLM for OCR is like hunting rabbits using Yamato’s 18in naval gun.



Well using a human is bring an interstellar rail gun to hunt rabbit so i guess it still better ?


Not really. Proper OCR in the broadest sense (extracting text from arbitrary pdfs that intermingle tables, images, etc, or from hand written artistic posters) requires a full understanding of semantic intent.

You are perhaps imagining more constrained scenarios of straight lines of consistent text on a page with well-known artifacts of "noise" (smudges, print imperfections, and so on).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: