Hacker News new | past | comments | ask | show | jobs | submit login

I guess it is conceivable that you could take 1000 dirty input files and their cleaned output and use this as a training set for a ML-power data cleaning system. But:

-It would only work in a narrow domain.

-How do you know if you can trust the results?




Sure. You could and should do that if you can. But, as you say, it'll still only work for one narrow definition of "dirty" and another, equally narrow, definition of "clean".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: