Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can't speak for voice data, as I've not worked with voice, but I did my MSc on various approaches for reducing error rates for OCR. I used a mix of synthetically degraded data ranging from applying different kinds of noise to physically degrading printed pages (crumpling, rubbing sand on them, water damage), and while it gave interesting comparative results between OCR engines, the types of errors I got never closely matched the types of errors I got from finding genuine degraded old books. I've seen that in other areas too.

My takeaway from that was that while synthetic degradation of inputs can be useful, and while it is "easy", the hard part is making it match real degradation closely enough to be representative. It's often really hard to replicate natural noise closely enough for it be sufficient to use those kind of methods.

Doesn't mean it's not worth trying, but I'd say that unless voice is very different it's the type of thing that's mostly worth doing if you can't get your hands on anything better.



You might say, if you can identify and simulate all cases of real life degradation, your problem is basically solved, just reverse the simulation on your inputs.

I’m not saying ocr isn’t hard. I’m saying normalizing all those characters basically is the problem.


This isn't quite true if e.g. there are degenerate cases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: