Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes but the article itself is user-provided content to Medium that the author has a right to ask to be deleted (under GDPR), presumably? So perhaps it will be simply a matter of the The Wayback Machine having to have a policy to delete things if requested?



No! GDPR is about personal data, which is well defined in the regulations and does not include blog posts. The right to delete data (or "be forgotten") is nothing to do with GDPR. If the original post contained personal data, it is a different issue but if that was put out into the public domain, it is a hard problem to solve.


What if the blog post contains personal data?


Most pages have an author section already.


So add some "personal data" to the end of anything you might want to demand someone forget later.


Or you could send a good ol' DMCA takedown request.

I'm not sure where this idea that nothing could be forced off the web before the GDPR came from.


No, if you intentionally made that data public then it's done. GDPR doesn't, say, force you to remove political views of Theresa May from newspapers, despite that being covered by personal data, because Theresa May made those views public.


So if was subject to the GPDR, and published my nginx logs in real time, I could stop worrying about scrubbing "personal" data from them on request?


The Wayback Machine has always had a policy to delete things if requested, so there's no real change there. The most common way site owners do that is by changing robots.txt. In line with the Oakland Archive Policy [1], the Internet Archive respects robots.txt retroactively, so a site owner can get archived versions deleted just by excluding them in the robots file. Besides that, they respond to DMCA takedowns, one-off removal requests [2], etc.

[1] http://www2.sims.berkeley.edu/research/conferences/aps/remov...

[2] http://archive.org/about/faqs.php#2


Changing robots.txt does not delete content from their archives. If you remove the robots.txt file, the content becomes viewable again.

There's no scenario where they can respond to the vast scale of GDPR violations that their archive likely represents, when it comes to manually removing content. There are only three possibilities: avoid the EU as much as possible, dump the archives and start over with an entirely different approach, or shut down. Besides that, these laws are going to get a lot more strict and difficult to comply with, not less strict, over time. This is merely the beginning of aggressive regulation of the Internet. Regulation of the Internet will only move one direction from here, in the direction of increasing burden and ever greater regulation. It's hard to imagine Archive.org's archives surviving what's coming.


There's no scenario where they can respond to the vast scale of GDPR violations that their archive likely represents, when it comes to manually removing content.

"GDPR violations". What's that, exactly? As far as I know, you only have to remove personal data upon request, no preemptively. So I don't see how they are "violations".

Will a lot of people make these requests? Possibly, but where's the evidence of that? People have been able to use copyright takedown requests (e.g. under the DMCA) forever, yet the Archive is still around.


Actually the recommended data handling says you should specifically state the purpose for needing the data, and that it should be reasonably limited to that need; i.e. if you don't need it any more you should pro-actively delete it.[0]

[0]https://ico.org.uk/media/for-organisations/documents/1475/de... Pages 4-6


They do have a legitimate interest (in the sense of article 6(1) of the GDPR), namely providing an internet archive.


I would agree in the case of the wayback machine they have a very strong case under article 6(1).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: