Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would like to see how it performs with massively warped and skewed scanned text images, basically a scanned image where the text lines are wavy as opposed as straight horizontal, where the letters are elongated. One where the line widths are different depending on the position on the scanned image. I once had to deal with such a task that somebody gave me with OCR software, Acrobat, and other tools could not decode the mess so I had to recreate the 30 pages myself, manually. Not a fun thing to do but that is a real use case.


I use gemini to solve textual CAPTCHAS with those kind of distortions and more: 60% of the time it works every time.


Are you trying to build a captcha solver?


No, not a captcha solver. When I worked in education, I was given a 90s paper document that a teacher needed OCRd but it was completely warped. It was my job to remediate those type of documents for Accessibility reasons. I had to scan and OCR it but the result was garbage. Mind you I had access to Windows, Linux and MacOS tools but still difficult to do. I had to guess what it said, which was not impossible but it was time-consuming, not doable in the time-frame I was given, so I had no option but to manually retype all the information into a new document and convert it that way. Document remediation and accessibility should be a good use case for A.I., in education.


Garbage in, garbage out?


"Yes" but if a human could do it "AI" should be able to do it too.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: