robots.txt is really only supposed to be used for blocking the Internet Archives...

jrochkind1 · on May 1, 2018

Hmm, that may be what it's meant for, but pretty sure it can currently be used to block things retroactively too. IA may still have it in the archive, but won't let viewers view it.

As happened in this case: https://news.ycombinator.com/item?id=16919017

No? The article you linked says they've stopped paying attention to robots.txt for US government and military sites, but it looks like it still retroactively removes visibility for everything else.

I guess IA could change their practices. If medium or people like them start actively using robots.txt to try to retroactively remove things from visibility in the archive, perhaps IA will change their practices/policy. I would welcome it.

gnode · on May 1, 2018

Interesting. I wasn't aware that it no longer applies retroactively. Even so, medium.com's robots.txt still doesn't try to block new crawling by the Internet Archiver:

https://medium.com/robots.txt

Or via WBM for posterity: https://web.archive.org/web/20180430183503/https://medium.co...

It seems unlikely to me that they would deliberately go to this length to prevent archival, yet not attempt to prevent it happening to begin with. Furthermore, as mentioned in your link, they still accept removal requests via email.