Hacker News new | past | comments | ask | show | jobs | submit login

Internet archive does rescrape periodically, and it removes archived pages based on the current robots.txt. This behavior is documented behavior of the archive that goes beyond the normal conventions of robots.txt.



I would add, the content itself is not removed. They only stop displaying it whilst the robots.txt says not to. If they can not reach your robots.txt, the content comes back as I have experienced multiple times.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: