Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
aidenn0
on Sept 9, 2021
|
parent
|
context
|
favorite
| on:
PaddleOCR: Lightweight, 80 Langauge OCR
It seems to completely ignore punctuation for the corpus of English text I tried on it; punctuation came through either not at all (e.g. "Id" for "I'd") or as letters (e.g. "P" for "?").
timClicks
on Sept 9, 2021
[–]
Welcome to OCR. It's often possibly to overlay the raw results with a language model to improve them, but ultimately it's a probabilistic process.
aidenn0
on Sept 9, 2021
|
parent
[–]
I've done OCR before; it seems like they must not have had any punctuation in their ground-truth set for English here though...
Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: