You might say, if you can identify and simulate all cases of real life degradati...

jfoutz on Feb 28, 2019 | parent | context | favorite | on: Mozilla releases the largest to-date public domain...

You might say, if you can identify and simulate all cases of real life degradation, your problem is basically solved, just reverse the simulation on your inputs.

I’m not saying ocr isn’t hard. I’m saying normalizing all those characters basically is the problem.

dbdjfjrjvebd on March 1, 2019 [–]

This isn't quite true if e.g. there are degenerate cases.