Using image embedding and evaluating 100s billion parameter LLM for OCR is like ...

manquer · on April 30, 2024

Well using a human is bring an interstellar rail gun to hunt rabbit so i guess it still better ?

jonahx · on April 30, 2024

Not really. Proper OCR in the broadest sense (extracting text from arbitrary pdfs that intermingle tables, images, etc, or from hand written artistic posters) requires a full understanding of semantic intent.

You are perhaps imagining more constrained scenarios of straight lines of consistent text on a page with well-known artifacts of "noise" (smudges, print imperfections, and so on).