Hacker News new | past | comments | ask | show | jobs | submit login

It seems to completely ignore punctuation for the corpus of English text I tried on it; punctuation came through either not at all (e.g. "Id" for "I'd") or as letters (e.g. "P" for "?").



Welcome to OCR. It's often possibly to overlay the raw results with a language model to improve them, but ultimately it's a probabilistic process.


I've done OCR before; it seems like they must not have had any punctuation in their ground-truth set for English here though...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: