Hacker News new | past | comments | ask | show | jobs | submit login
In practice, cool URLs can become inaccessible even if they don't change (utcc.utoronto.ca)
116 points by ingve on July 1, 2023 | hide | past | favorite | 53 comments



I don't know if I'm reading this right, but it sounds like the article takes "A cool URL doesnt change" to mean "Expect that URLs are forever" - where I take it to mean "To be cool - dont change your URLs".

My point is - I expect all URLs to rot or change. But you're still cool for keeping your URLs stable while it's in your power.

Reddit permalinks disappearing is is the very reason for the phrase, not proof that it's wrong


The article is using this phrase per https://www.w3.org/Provider/Style/URI , which is nothing to do with Reddit and certainly was not inspired by it. The point M. Siebenmann is clearly making is that a quarter of a century's hindsight shows a whole bunch of holes in the W3C's argument, with the recent Reddit events being simply one more counterexample in a long list of counterexamples.


The phrase is being used in a “cool kids don’t do drugs” or “only you can prevent forest fires” sort of way.

If a URL changes, it isn’t “cool” it’s now “uncool.” It’s an oxymoron to say cool urls do change.


What "holes" does this show in W3C's argument?

I would say that this style doc would respect the technical choices of both twitter and reddit since even though the rules of access changed, the urls themselves didn't.

From that document:

> At W3C we divide the site into "Team access", "Member access" and "Public access". It sounds good, but of course documents start off as team ideas, are discussed with members, and then go public. A shame indeed if every time some document is opened to wider discussion all the old links to it fail! We are switching to a simple date code now.

Thus the correct choice in URL design was indeed made as (AFAIK) no links need to be updated even though the rules for who is allowed to access the content at that url changed.

This question of URL design is orthogonal to any debate about the reasons and justifications for restricting access to content that was previously public.


Yeah, keeping your own URLs static is cool, but what's really cool is an automated check for broken links and a policy not to link to brittle sites.


> Reddit permalinks disappearing is is the very reason for the phrase, not proof that it's wrong

Cool URIs don't change was written in 1998 and Reddit was launched in 2005...?


I assume they mean not the reddit practice specifically but of the same link rot phenomenon that takes place across the web.


Agreed. The corollary being that URLs that do change are usually because of poor stewardship on behalf of some responsible party: Reddit and Twitter are good examples of that.


> I'm using 'cool URL' in a somewhat loose sense here, because Reddit and Twitter never promised that these URLs would be eternal

Reddit uses the term 'permalink' which kinda does promise that.


Reddit has had a delete button on comments and a private subreddit option from the start. I don't think anyone would expect a permalink to show deleted or hidden content.


Isn't the permalink missing from the new interface?


There is nothing more temporary than a permanent solution.


And there is nothing more permanent than a temporary solution.



"There is nothing more temporary than a permanent solution and nothing more permanent than a temporary solution" needs to be designated as a rule or law akin to Hofstadter's law or the Ninety-ninety rule. As a placeholder until a better name is chosen, I propose we call it Foundart's Rule.


It's already named Daugherty's Law.


[citation needed]


Easily locatable under that name. The Daugherty is the late Richard D. Daugherty, who was Professor of Anthropology at Washington State University.


Googling for "Daugherty's Law" only returns law firms, and a cursory reading up on Richard Daugherty doesn't seem to indicate anything of the aforementioned.

So, yes, would appreciate a citation so we can all learn a new trivia of the day.


Yeah, seems like something a professor said one day in class but never published.


Only the tides are eternal.


I don't think the tides are eternal


A second and more illustrative case is that within the past day or so, Twitter has stopped letting you see anything (tweets, profiles, etc) unless you're a logged in user of the site.

I like how this article currently is right next to the other one discussing exactly that directly: https://news.ycombinator.com/item?id=36540957

These recent examples have probably also brought to prominence a point that I've always considered to be a given: if you think some data you come across online is important, save a copy --- it might be the last time you'll be able to access that URL. As the old saying goes, "data that is not backed up is data you don't want to keep". Storage costs, relatively speaking, nothing for textual data, and very little for images. Relying on browser bookmarks, or even worse, search engine queries, to find and expect others to continue to provide the content you truly want, seems to be turning out increasingly a bad idea.


Another good example of this might be the transition from http to https. Granted, that's not actually the domain name but most people probably wouldn't make the distinction. Although HTTP URL's still work on most TLDs in most browsers, most sites have transitioned to HTTPS. Further, HTTP might not always be supported by browsers, and as a site owner you don't have a lot of control over that. There are some solutions to protocol changes, such as redirecting, but that too could fail down the road.

The internet is so ephemeral and at the same time feels rather permanent.


In terms of backwards compatibility, I think that would be an easier one to resolve. Returning a 308 (Permanent Redirect) for all http requests to a https version is a single catch-all rule.


Isn't this what most websites are already doing? Just redirecting http://example.com/test1 to https://example.com/test1?


Correct. The parent post gave http to https as a type of link-breaking, and this is how the redirect is implemented to avoid breaking links.


Any harm in just making it default browser behavior to first try the link as clicked, then try https if it fails? Possibly could lead to unexpected behavior for resources loaded in the background, but I can’t think of unwanted behavior l.


It's more secure to first try https.


you could have bookmarked a permalink before the site moved to using https


the worse part of this is embedded http content in a https site breaking due to changing browser policies. images are upgraded to https if possible, otherwise broken. script embeds are just unilaterally broken if the embed is http, even if https is available.


In retrospect this cool urls don't change ideology might have been net negative. Instead of trying to fight the inevitably that content moves around and gets occasionally lost, it could have been better to embrace the ephemerality; promote mirroring and storing local copies. "Cool URLs are mirrorable"

Notably WARC format was developed only 10 years after that famous essay, and there are lot of other things that could have been done to make mirroring proper first class citizen in web.


> In practice, cool URLs can become inaccessible

No, they can’t. By definition, if they do that they are no longer cool.

Of course, Reddit and Twitter URLs were never especially cool in the first place. They’re particularly uncool now. Tragically unhip, you might even say.


A permanent link resembles the digital archive laws, where governments are mandated to keep documents readable indefinitely. Why not extend that to official publications and demand that gov links will never change. Not only the link should stay available, the page must also be renderend by a device still available in 500 years.

It is complicated but doable. There is a list of allowed formats, like pdf-a, ms word, sql where there is consensus it will be readable forever. Not sure how archive.org does it, but i assume they will also transform a page to static and standard html.


If you ever end up in the distant future, go to Svalbard and look for the Arctic World Archive. They have microfilm copies of a huge amount of data. They have Wikipedia pages in microfilm format, so all you need is a magnifying glass to get started. You can then look for the Github Code Vault slides that explain how to restart technology from scratch and run the code in the git repository archives.

https://github.com/github/archive-program/blob/master/GUIDE....

https://github.com/github/archive-program/blob/master/TheTec...

https://arcticworldarchive.org/


One of the foundational ideas of the web (vs. most earlier hypertext systems) was that links could break, so there was no global scalability issue.

I've long thought we needed a replacement that doesn't have that property. NNTP had a good storage model for important data:

If you want to read articles from a site, then you create a mirror.

I wish we'd move back to that.

In addition to supporting distributed archival (the only type of archival that works), it breaks targeted ads, and eliminates the incentives that lead to clickbait.


I do wonder with all those cool .custom something TLD domains. Currently the private registry might be charging 10s or 100s of dollars for them. But what will happen once the brand will become a household name.

If Google had been an .supercool back in the day. What would the owner of .supercool charge them today?

I know that today Google would probably buy him out, but the transition period of going from nobody to Google they might needs to pay 100,000s of dollars for their domain.


Archiving sites are doing the lord's work with respect to this. The unfortunate reality is that the closest thing to a fixed point is (URL, timestamp).


Thank God for archive.org


Domains cost money. Servers cost money. Updating software costs time and therefore money. Ergo, once the incentive to keep it alive is gone, the URLs change. Not to mention other considerations. Maybe the expectations shouldn't be for URLs to change, rather than for them to be temporary things.


When I was thinking about this problem a couple of years ago (eek, time flies!) I came to the conclusion that the only archival approach that was viable for this was to have a dedicated url resolver service that was independent of the DNS system and could be swapped out.

Obviously you wind up with resolver resolver resolver services if this goes on for too long, but it is one of the few workarounds for the fact that time (and many other things) are not reified in the DNS system.


How would such a service handle time? I’m talking about flooble.com in 2005 being entirely different from flooble.com in 2020


the internet archive handles that well


Yeah, if the service/website goes down any of its URLs will go down with it. Can happen to anyone. By being down I mean policy changes, moderation or server down. Pretty lame/catchy/scammy title, as the problem is not at all related to the URLs themselves but the services.


It's not the intended use case & it's explicitly bound by restrictions in accepted DNS practice that say 7 days is the max time one should trust domain authority, but I really wish & hope somehow Signed HTTP Exchanges (SXG) can become a thing. https://web.dev/signed-exchanges/

The idea is that http content can self sign itself, in a way where one can safely make a bundle of these exchanges & potentially sneakernet/thumb drive them to a friend who can then read the content. And trust that it came from the origin it came from.

Crossed with certificate transparency systems it really creates a new possible expectation on the web, that users can take-away the content they come across. This seems like a minimum bid towards FAIR. https://www.go-fair.org/fair-principles/


Same goes for the walled of news articles posted on HN. If you can't link to it or provide an accessible copy of it, it can't be expectes to be part of public discourse.


The word semantics game played in these articles is silly.

A URL cannot change; it is not a changeable object. For instance "https://example.com/index.html" is what it is: that string.

Just like the integer 42 cannot be changed to be 73.

All the problems with change have to do with how the URL resolves, like that it may point to different content at different times or become inaccessible. Changing and becoming inaccessible are not different problems: the former is impossible, the latter is the problem.


The only thing that would be cool here is if the owner of the url would inform the people who refer to that url that it is now invalid.


Hard problem.

There are solution for permanent storage, that prevent URLs from going dark.

But even those offer ways to get rid of problematic content.


What? The URL hasn’t changed, the content has.


Like the twitter url right now.


The author just doesn't know what cool is.

If it's important, put it on public IPFS and make sure people seed it.


They’re riffing off https://www.w3.org/Provider/Style/URI.html , which was written in the 90s before IPFS was a thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: