I guess it is conceivable that you could take 1000 dirty input files and their c... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

hermitcrab on May 29, 2021 | parent | context | favorite | on: Automated Data Wrangling

I guess it is conceivable that you could take 1000 dirty input files and their cleaned output and use this as a training set for a ML-power data cleaning system. But:

-It would only work in a narrow domain.

-How do you know if you can trust the results?

ab111111111 on May 29, 2021 [–]

Sure. You could and should do that if you can. But, as you say, it'll still only work for one narrow definition of "dirty" and another, equally narrow, definition of "clean".

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact