Hacker News new | past | comments | ask | show | jobs | submit login

SPN can be run from a script using wget, curl, or any other HTTP GET request generator.

If you can come up with a set of URLs for your content, you can archive it by prefixing it with "https://web.archive.org/save"

So if you have:

    https://www.example.com/page-1
    https://www.example.com/page-2
    https://www.example.com/page-3
You'd generate (and request):

    https://web.archive.com/save/https://www.example.com/page-1
    https://web.archive.com/save/https://www.example.com/page-2
    https://web.archive.com/save/https://www.example.com/page-3
This is trivially scripted, or there are a few existing generators.

I've created archives of ~12,000 or so posts, from an old desktop Linux system over modest residential broadband, in less than an hour, running up to 20 parallel requests via xargs or GNU parallel.

For a basic curl-based URL archiver (call once per URL on your list):

    curl -s -I -H "Accept: application/json" "https://web.archive.org/save/${1}" |
        grep '^x-cache-key:' | 
        sed "s,https,&://,; s,\(${1}\).*$,\1,"



https://github.com/pastpages/savepagenow is useful for this as a packaged utility.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: