Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hmm, all they got to do is have a dynamic robots.txt that forbids wayback from the deleted articles, and they'll remove the workaround even. yes?


Once it's stored I imagine they don't need to even scrape the page again, so robots.txt wouldn't do anything.


Internet archive does rescrape periodically, and it removes archived pages based on the current robots.txt. This behavior is documented behavior of the archive that goes beyond the normal conventions of robots.txt.


I would add, the content itself is not removed. They only stop displaying it whilst the robots.txt says not to. If they can not reach your robots.txt, the content comes back as I have experienced multiple times.


Sure, but they'd have to create a list of every deleted article which seems like it would be pretty long.


I assume their software is quite capable of automatically creating such a list to include in robots.txt, automatically generated.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: