I don't think it mattered so much for print, but in the early days of the internet you would hear people on the TV laboriously reading out "haich tee tee pee colon slash slash double-u double-u double-u dot contoso dot com".
This is why a friend of mine has the email address "dot at dot at dot at". Yes, that's uniquely parseable back to one valid rfc822 address that works.
I have occasionally wondered about programming language syntax that might take reading aloud into account. Puncutation ends up sounding like Victor Borge: https://www.youtube.com/watch?v=Qf_TDuhk3No and of course Python's indentation is very at risk of ambiguity.
After years of people mistyping "jasonmill" for "jasomill" in my (originally university-assigned) email address, I registered jasomill.at so I could receive email as "jasomilldot", because I guess turning the simple act of giving out my email address over the phone into an Abbott and Costello routine sounded like a good idea at the time.
Oh, and you don't need special syntax to write code to be read aloud, to wit,
> turning the simple act of giving out my email address over the phone into an Abbott and Costello routine sounded like a good idea at the time
Hah! I've been dealing with that for a few years now -- my email is most of my username and spelling it out is fun for neither party.
I've considered getting the shortest, most phonetic-friendly email possible specifically for phone calls; something like a3gx@gmail.com but also want to self-host. It's a hassle!
I have an address specifically for that—it's just one letter and a few digits, redirecting to the main account. Works like a charm. Especially when the native alphabet in the country is not Latin and laboriously explaining which letter is which is a meme from the phone calls era. Whereas I can just call digits by the names in the local language, and everyone is familiar with hearing them.
Meanwhile, with my main-ish addresses, I've been told that they look like a random jumble of letters or a password. And once I've given one of them over the phone, took ten minutes easily—with confusion as to even how many letters are there.
With my domain name, I've been using a one-letter address for humans but have found it often confuses people; it doesn't fit the pattern they expect. So, I've been thinking of switching to domain@domain.tld, which is still obviously "special" but matches the regular pattern better (and I don't have to spell out <domain> twice).
A family member has firstname@firstnamelastname.com, which is super nice, but unfortunately my name and most variants are taken.
My GP is not very tech savvy and any time we call the surgery there is a lengthy unskippable message about Covid with the receptionist reading out https://111.nhs.uk/covid-19 down to the last colon.
Victor Borge's solution for any nebulousness around punctuation for Alexa and other voice-activated devices is sheer genius! ...And, so funny and cute! Having seen this little bit reminds me that I have not seen/enjoyed nearly enough Victor Borge material! Thanks for sharing this video!!!
You elide the parenthesis. They are as unnecessary when speaking Lisp code aloud as saying "comma", "period", or "question mark" are in everyday conversation.
My only hardship with this is about half the population doesn't know the difference between a slash and a backslash, which is probably not their fault because they wouldn't have encountered them outside of typesetting or computing. Which is fine until you hear the TV person above do it wrong. Luckily most browsers figure it out.
I like to tell people slash is the one you put in a date.
The only place a non-programmer user would encounter a backslash is in Windows (nee DOS) file paths, right? Yet another thing Microsoft has made worse in the world ;)
Can we talk about gboard, Google's default keyboard for Android for a moment?
On my phone at least, the backslash has an clearly and easily accessible shortcut (long press 'w'). Meanwhile, to enter a forward slash, one has to not only tap into the special symbols page, but actually tap again into the SECOND PAGE of that.
Does anyone have any insight as to just what these galaxy brains at Google are thinking?
This is just a guess, but reading RFC 819 (https://tools.ietf.org/rfc/rfc819.txt), which transitioned ARPANET to a hierarchical naming scheme and also predates DNS, the little-endian notation might be a simple artifact of ARPANET's e-mail addressing. E-mail addressing was already user@host, which is basically little-endian. It would be consistent to extend it like user@host.site.network. JANET also used @ notation, but used big-endian notation for the domain, which seems inconsistent, at least from a user's perspective.
Wikipedia says JANET's e-mail addressing notation was defined by the "Grey Book", and this Usenet thread, http://neil.franklin.ch/Usenet/alt.folklore.computers/200209..., says the Grey Book domain notation comes from the Network Independent File Transfer Protocol (NIFTP aka "Blue Book", which was a different protocol from ARPANET's RFC 354 FTP). This 1990 JANET<->ARPANET e-mail gateway document, http://dotat.at/tmp/JANET-Mail-Gateways.pdf, says that JANET e-mail was transferred using NIFTP, so it would make sense that the domain part of the e-mail address would use NIFTP rules. Both above sources say (explicitly or impliedly) that JANET generally, and NIFTP specifically, were based on X.25, and X.25 uses big-endian addressing.
So on JANET the hierarchical naming scheme predated the e-mail addressing scheme[1], whereas on ARPANET the reverse is true. Both formats make sense as path dependent outcomes.
[1] Presumably JANET still adopted user@ because the message format was based on RFC 822, according to that gateway document above, but it was still worth partially deviating from RFC 822, which explicitly defines little-endian domain syntax, because of JANET's pre-existing host addressing scheme.
I am still waiting for an apology for mathematics and base 10 numbers being written the wrong way round logically!
Same with SQL. I am giving SQL trainings this year, and I have to explain why the server will read your query in a completely different order than how you write it.
Numbers are written correctly, most significant digit on the left. SQL is mostly correct according that logic, first `join` then `where` then `order by`. There are inconsistencies though.
You are used to see the most significant digit on the left, but really we have to right align a column of number in order for them to be readable, whereas everything else we read is left aligned. And you don't know what this significant number corresponds to (thousands, millions, billions?) until you have read all the other numbers (if you read left to right).
As for SQL, you write
SELECT TOP 10 ColName FROM TableName WHERE X = 10 ORDER BY ColName
and the server reads
FROM TableName WHERE X = 10 SELECT ColName ORDER BY ColName TOP 10
It is not "mostly" the right order.
And right to left assignment for mathematics or programming.
I suppose an enterprising browser hacker could implement this such that the browser could accept URLs in this format and reorient them when it makes the requests. Even paths would be tab-completable from browser history, would they not? I would love this in my browser.
That's actually how addresses are written in Chinese -- most general to most specific. Super disorienting when you're learning it as an English native speaker.
I think he can be forgiven for not anticipating the cost of a couple slashes. It's probably not as expensive as the addition of 'null' to ALGOL, and that was actually intended to be very widely adopted!
I wrote my MS dissertation under TimBL at MIT. I was into semantic web technologies in the late aughts and chatted with him few times about the double-slash and web nomenclatures etc.
I think this article either exaggerates or misstates his intent. IIRC, his thinking was the double slash was just a continuation of the path operator in *nix and would represent the "hyper" in hypertext.
I really don't think the semantic web was a "mistake" at all. And as much as Sir Tim would hate me for saying this, it was (is) still AI - more correctly Symbolic AI - to collaborate at web-scale.
Mind you, back then AI was a dirty word, even at MIT, and we had to couch it as "cognitive computing","distributed intelligence" etc.
It is a laudable effort and I still love me some semweb technologies.
I'm not so negative. The double slash provides a significant visual differentiator that a single colon would not. It may "waste" paper but it saves time.
Not grandparent but it saves time the same way syntax highlighting saves time. You simply recognize the difference faster.
With http://example.com it's quicker to distinguish the protocol (http) and the domain name (example.com).
Compare that to http:example.com and it might take a bit longer at first glance (to some people), because they read over the : and then need to do a quick linear scan before they spot the : and are then able to distinguish http vs example.com
Given that one sees a lot of domain names, I'd say it'd save a few hours of everyone's life in the aggregate.
But humans rarely need to parse URL's into components but often need to read them aloud or type them. The extra characters make this slower and more error prone.
For most user interaction purposes the URL is just an opaque string. Only the browser need to actually parse the URL.
If there's anything to "apologize" for with URLs, it would be the use of the ampersand character for separating query parameters, as these are interpreted as SGML ERO (entity reference open) character in SGML default concrete syntax, making SGML parsers see an entity reference in links such as http://bla.bla/doc?param=value&otherparam=othervalue. XML doesn't help here as it rejects the whole string as attribute value. This is also IMHO an oversight in the XML and WebSGML specs (the latter allowing to circumvent the issue via so-called data attributes).
URL's were supposed to be universal, not limited to any particular media format. The ampersand have to be escaped as & in SGML/HTML/XML. Other formats will need other escapes - no characters are "safe" in all formats.
there were attempts to use ; instead, many query parsers in standard libraries still split parameters pairs on semicolon. But who cares if browsers still generate & in GET forms
Given the standard RFC, the two initial slashes as in scheme:// are critical in any URL when you need to be able to distinguish between the authority and path component, as in scheme://authority/path as in http://server/file.txt. Otherwise it would be impossible to know when the server part finished and the path part began (since the authority or server part is always between the second and third slashes). Given the article, I think we’re very lucky to have ended up with this design. But I suppose we could have also arrived at a syntax where the path begins with the second slash (e.g. scheme:/server/path). In fact scheme:/path is valid syntax and is simply the contracted URL form of scheme:///path so at least by today’s RFC definition, scheme:/server/path wouldn’t work since in this contracted form, the path begins with the very first slash and that ‘server’ bit wouldn’t be a server at all but also part of the path.
It's often mentioned that open source has '1000 eyes' to correct and improve things.
The web has many more than that, I'm sure he can be forgiven for not anticipating every scrutiny about URIs/HTTP/HTML.
Seems like a small nitpick, sites that don't optimise their images or compress text content over the wire puts the space savings from no // into the shade
Fair point, especially about the browser. Could be argued that mainstream browser's behaviour is what translated to how people offline communicated URLs. If they assumed http for a URL that was without protocol, then perhaps less people would've felt the need to include the protocol on paper.
However, scheme relative URLs (i.e. //example.com/thing.jpg) are useful in edge cases where you want to request assets using the same protocol as the document, are they not?
FWIW I had googled whether hostnames may be entirely digits before asking this, and this SO answer suggested yes. Perhaps that doesn’t apply to URLs? But to be clear I meant hostname not a FQDN, because obviously there is no ambiguity once a dot is present.
But without the double slash, how do you differentiate between:
proto:relative_to/current_dir
proto:/relative_to/root_of_current_host
proto://otherhost/some/resource
I admit it is hard to grok in the beginning, but all you need to tell non technical people is "you can leave out the http and www" and "hyphen and slash are different symbols".
A: I wanted the syntax of the URI to separate the bit which the web browser has to know about (www.example.com) from the rest (the opaque string which is blindly requested by the client from the server). Within the rest of the URI, slashes (/) were the clear choice to separate parts of a hierarchical system, and I wanted to be able to make a link without having to know the name of the service (www.example.com) which was publishing the data. The relative URI syntax is just unix pathname syntax reused without apology. Anyone who had used unix would find it quite obvious. Then I needed an extension to add the service name (hostname). In fact this was similar to the problem the Apollo domain system had had when they created a network file system. They had extended the filename syntax to allow //computername/file/path/as/usual. So I just copied Apollo. Apollo was a brand of unix workstation. (The Apollo folks, who invented domain and Apollo's Remote procedure call system later I think went largely to Microsoft, and rumor has it that much of Microsoft's RPC system was).
I have to say that now I regret that the syntax is so clumsy. I would like http://www.example.com/foo/bar/baz to be just written http:com/example/foo/bar/baz where the client would figure out that www.example.com existed and was the server to contact. But it is too late now. It turned out the shorthand "//www.example.com/foo/bar/baz" is rarely used and so we could dispense with the "//".
As I've heard the history, Microsoft tried to use a forward slash for directories in DOS 2.0, but IBM insisted it be a backslash so that it didn't conflict with the parameter switch.
Yes, the (recently released) source code for MS-DOS 2.0 is liberally sprinkled with comments that indicate Microsoft fully intended for the path separator character to be '/' and the command-line option designation character to be '-'.
Microsoft had to design a new API for DOS 2.0, as this was the first version to support hard disk drives and hence required support for subdirectories to organize the filesystem. The API was intentionally designed to mimic Unix. [1]
And it was also intended for devices (such as CON, LPT1, COM1, etc.) to be prefixed with the special directory name 'DEV', as in '/DEV/CON' and '/DEV/LPT1', just to make it feel even more like Unix. [2]
Apparently the idea was that MS-DOS would be Microsoft's single-user operating system running on cheap 8088 machines, and Xenix would be their "enterprise" multi-user operating system running on high-end 80286 systems, and programs could target a single common DOS/Xenis API and could be run on either OS.
> Also, there are a number of factors at play, a number of different futures that could have been resulted if there were no slashes just by chaos theory[1]. So, speculating in the hindsight is not fruitful.
This sounds like an argument that one should do whatever, because whatever can result from it.
This is why a friend of mine has the email address "dot at dot at dot at". Yes, that's uniquely parseable back to one valid rfc822 address that works.
I have occasionally wondered about programming language syntax that might take reading aloud into account. Puncutation ends up sounding like Victor Borge: https://www.youtube.com/watch?v=Qf_TDuhk3No and of course Python's indentation is very at risk of ambiguity.