Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These days whenever I read an interesting article, I will take 2 minutes to copy and paste it into my Obsidian vault under my Articles folder. I'll take care to paste the images as images (and not links) and make sure I've got the author and source URL at the top, and have my separate notes section link to it. It's a bit silly and obsessive, but given how transient content on the Internet is, I think it's necessary to make a copy of anything you care about.


I built Obsidian Web Clipper to automate that process. It also allows you to save web pages as nicely formatted Markdown files with YAML properties even if you don't use Obsidian.

https://github.com/obsidianmd/obsidian-clipper


Wow this is awesome, really love the AI features!


I use https://github.com/gildas-lormeau/SingleFile

I set it to tolerate longer processing times, and to open the file after saving so I can sanity check that it got everything. Works great at faithfully saving a page with images as it appears in browser, and saves so much time.

You might also have a look at https://github.com/ArchiveBox/ArchiveBox


Also, I believe by default the files are saved as plain html (with resources being base64 encoded), so search tools which can index the contents of html files will work.

There is also the option to have the contents compressed, and (a separate option) to keep the plaintext of the file uncompressed, which will likewise still allow indexing to work while saving space.


I noticed a web clipper was just released for Obsidian last month. Maybe that'd cut down those two minutes for you.


Yes! The Obsidian Web Clipper is pretty neat. I just published an article about it: https://www.dsebastien.net/supercharge-your-knowledge-captur...


I am using monolith to just save the whole page to disk.

https://github.com/Y2Z/monolith


I do something similar but with Discord. I made a server accessible only by me, and I have a few different channels like work, life, music, ideas, etc. I also send all screenshots I take into a separate channel, and set up a chrome extension that sends whatever page I'm on as a link.


What if discord goes away. I would think you want the data local.


terrible idea. people get their discord accounts banned randomly without warning


Unfortunately it's not super easy to get data out of Discord either. Last I checked, one needs to carefully setup a bot then script the bot to download messages to CSV, etc., but if you're not careful with the account and bot setup, the export process itself could lead to a ban.


like recently they banned the entire country of germany by accident


How often do you reference your vault?


Agreed. I think you could automate some of that too, could save time if you do it often.


In my day browsers could save an archive of a page

Is this still the case?


They can but generally that includes any Javascript on the same page which sometimes does funny stuff when you open it up offline or after the remote server goes away.


SingleFile can make a snapshot with just content/styling


It's not perfect, but Edge will let one take a simple full page screenshot with Ctrl+Shift+S. It results in a hefty PNG but at least it's a visual copy of everything which might suffice for a certain set of purposes (e.g. links will be lost, so it's not good for that).

I can still right-click > Save any page as .html, but that doesn't guarantee server streamed stuff, media, images, etc. will be preserved correctly.


Thank you for this! I pressed Ctrl+Shift+S in Firefox just to see if it would work and it has the same functionality.


for the lazy, I think the web archive safari exports is standardised and gives you a good website backup.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: