You could also consider using the Common Crawl dataset provided by Amazon. Archi...

		Sysreq2 8 months ago \| parent \| context \| favorite \| on: Ask HN: How to Resurrect a Site from Archive.org? You could also consider using the Common Crawl dataset provided by Amazon. Archive.org is more or less a wrapper around it anyways. https://registry.opendata.aws/commoncrawl/