1
0

Post about bookmarks

This commit is contained in:
Tim Van Baak 2024-05-27 21:07:01 -07:00
parent 9b2abf6d23
commit 9355856dc4
3 changed files with 49 additions and 1 deletions

View File

@ -0,0 +1,46 @@
---
title: Archiving bookmarks
pubdate: 2024-05-27T21:06:37-07:00
feed: blog
---
A [comment on Hacker News](https://news.ycombinator.com/item?id=40397848) got me thinking.
> I realized I was overusing bookmarks. I now save webpages (perhaps as PDF) if it contains information I want to refer to later, such as an insightful article, technical information, a humorous bit, or the like.
> Bookmarks are good only for links to things for which only the most current version is worth accessing. Thats my banking websites, a shopping site, my employers remote desktop system, etc.
Right now I have somewhere around 5,500 bookmarks. They are sorted into a few top-level categories. "Ref" contains pages for reference, such as articles I might want to cite later, neat sites I might want to read again, reaction image links, etc. "Util" contains a few utilities but mostly gets opened for a few bookmarklets. "Later" is something like a reading list, containing hundreds upon hundreds of "I should read this later when I have time" links, plus several subfolders by topic. "net" once contained videos or downloads to do at a time when I had slow Internet, but now mostly contains a list of things I need to download and sort. "proj" contains some project-specific folders with reference material, like pages on specific language features some program will need to use.
For most of these, that comment is very accurate. Many links I saved to articles are now dead, broken by site reorganizations or the host going down. Some can be recovered on the Internet Archive, some can't or weren't archivable to begin with. For a few, the site has become less usable or parts of the page have been removed; the latest version of the webpage is _less_ desirable than the version that was bookmarked!
I've been going through some of the "Ref" bookmarks like boxes of old things never unpacked after a move. Some of them aren't really very interesting any more and I can delete them. Some of them are worth saving, and I archive them. Pages without dependencies are the easiest to just download. For pages with resources, I am using the [SingleFile](https://github.com/gildas-lormeau/SingleFile) extension, which does what it says in the name. (It also operates on the page as it appears in your browser, which means I can delete useless things like comment sections before saving the page.) A few bookmarked sites have multiple pages, so I mirror them with `wget`.
For most webpages, what you really want is a few kilobytes of text. SingleFile is very useful for preserving the style of a page, but it also produces files with sizes in the megabytes. If I just want a few paragraphs, it's much easier to use [`htmlq`](https://github.com/mgdm/htmlq) to cut out the section that has the text I want and just save that. This is the script I'm currently using:
#!/usr/bin/env bash
if [ "$#" -lt 2 ]; then
echo "usage: $0 [url] [selector]"
exit 1
fi
URL="$1"
SELECTOR="$2"
PAGE=$(curl -s $URL)
echo '<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />'
echo "$PAGE" | htmlq title
echo '<link rel="alternate" href="'$URL'" />
</head>
<body>'
echo "$PAGE" | htmlq "$SELECTOR"
echo '</body>
</html>'
This preserves some context by leaving the original bookmark URL in a `rel="alternate"` link.
The paradox of the information age is that copying information is easy, even trivial, but once the information is gone you have almost no chance of finding it again. An old book may turn up in a dusty shop somewhere, but if it's gone and the Internet Archive doesn't have it, it's probably gone forever (so [donate to the Internet Archive](https://archive.org/donate)). Storage is cheap, especially for text; save your own copy before the original is lost to the perpetual rot.

1
src/blog/2024/index.md Normal file
View File

@ -0,0 +1 @@
* [Archiving bookmarks](./2024/bookmarks.md)

View File

@ -4,6 +4,7 @@ title: Blog
[RSS](./feed.xml)
* [Archiving bookmarks](./2024/bookmarks.md)
* [SHLVL PS1](./2023/shlvl.md)
* [Backing up my ZFS NAS to an external drive](./2023/zfs-nas-backup.md)
* [The traditional first software engineer blog post](./2023/blog-start.md)
* [The traditional first software engineer blog post](./2023/blog-start.md)