robots.txt is really only supposed to be used for blocking the Internet Archives first snapshot, and not to remove existing snapshots – and even this might not be the case in the future as they try to preserve most snapshots. They made a few policy changes last year[1] to how they handle robots.txt files, to handle cases where a domain is sold and a new robots.txt file would result in deleting old data among other things.
Hmm, that may be what it's meant for, but pretty sure it can currently be used to block things retroactively too. IA may still have it in the archive, but won't let viewers view it.
No? The article you linked says they've stopped paying attention to robots.txt for US government and military sites, but it looks like it still retroactively removes visibility for everything else.
I guess IA could change their practices. If medium or people like them start actively using robots.txt to try to retroactively remove things from visibility in the archive, perhaps IA will change their practices/policy. I would welcome it.
Interesting. I wasn't aware that it no longer applies retroactively. Even so, medium.com's robots.txt still doesn't try to block new crawling by the Internet Archiver:
It seems unlikely to me that they would deliberately go to this length to prevent archival, yet not attempt to prevent it happening to begin with. Furthermore, as mentioned in your link, they still accept removal requests via email.
[1]: https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...