Small nitpick, XHTML does live on in one important place: epub, which is the most commonly used file type for ebooks.
I do a lot of epub work at Standard Ebooks, and at first I hated XHTML for the exact reason the article describes: A single tiny error makes the entire document invalid and it can be hard to spot it and recover. Also, namespaces were (and still are) a massive pain for little to no gain.
But over time I've come to really appreciate XHTML, again for that exact same reason. It forces you to write correct XML, and there is no possible ambiguity or multiple ways of representing a tag like there is in HTML5. `<br/>` will always be `<br/>`, not maybe `<br>` or `<br/>` or even `<br></br>`. An XHTML document from someone else will be both human-readable (after pretty-printing) and easily machine parseable. You also get xpath, which is a way of selecting elements akin to CSS but even more powerful.
Ultimately XHMTL is better suited to epub's use case, which is static documents that are written once and not dynamically generated by templating languages or libraries. Too bad there was some talk of epub switching to HTML5, though it's unlikely future revisions of epub will ever see wide adoption.
I used to write XHTML and found that if you server it with the mime type application/xhtml+xml then the browser will validate it, which can be a problem if you can't reliably output valid XML
> Too bad there was some talk of epub switching to HTML5, though it's unlikely future revisions of epub will ever see wide adoption.
Why? Because of switching to HTML5 (if that indeed happens) or any future revision at all won't see wide adoption? If the latter then what's the reason according to you? Amazon's dominance?
Because new epub versions have the Python 3 problem. Every reading system out there supports epub 2, and epub 3, the latest revision, still has poor support years later because it's not a big enough improvement to bother with and too much software is written for epub 2. Plus epub has had a rocky past few years as the standard has been bumped around various committees.
Even xpath 1 let's you filter elements based on arbitrary sub-paths, so e.g. you can do stuff like //div[@data-marker]/span[i] - and match spans containing an i that are themselves the direct descendants of a div with a marker attribute. In css matching on content is in general impossible, with a few small exceptions like :empty, which is of course much less powerful. XPath will also let you select based on text-content, and select non-elements like text-nodes too. Also, it's likely much easier to make slow xpath selectors than css; oh well (but although I wrote quite a few XPath selectors back in the day, this was never a practical issue for me, which is perhaps also because unlike css - which is often used in huge quantities to match entire documents - XPath is more typically used to find just a few critical elements. I've never worked much with XPath 2, but it's even further in the direction of a kind of SQL for tree structures. By no means was XPath without weird limitations, make no mistake; but it's definitely more practical than css for finding specific elements in a messy tree especially when you don't control that tree.
X-anything nowaways is pretty unfashionable, but in principle XPath would still be useful today; bit of a shame it's falling by the wayside without decent replacement. I actively try to avoid it nowadays even though I'm comfortable with it; usually the upsides XPath has aren't worth (to me) the lack of familiarity for other devs, and the potential lack of support in various platforms compared to css.
But on the sidetrack the parent comment mentions:
In any case, I can unequivocally, whole-heartedly, so-verbosely-you-hopefully-understand-my-conviction on the matter support the notion that the html5 syntax was a disaster of the highest order. XHtml is a much, much better idea, and unfortunately it died. Anybody who's ever seriously tried to compose html safely and understands all the weird gotchas that can occur (oh that wrapper p tag got auto closed because the tooltip span contained an icon and those were implemented as divs? yay!), not to mention a syntax with so many exceptions essentially 0% of web devs actually know them all is just asking for XSS issues. You think your client-side SPA can do server-side prerendering reliable? Ah, but only if it sticks to valid html: including nesting rules, because although it will work without error clientside - the dom-tree is really just a tree without as many weird exceptions - lots of DOM trees simply have no html serialization. Doesn't exist.
To be perfectly clear: I don't think XHtml was particalarly brilliant; HTML5 is simply particularly atrocious.
If I could switch to a more comprehensible data-encoding format I would, in a hearbeat - especially since nobody writes raw html anyhow nowaways, it's all frameworks and libraries. And then you pray they correctly implement the html5 spec, because if they don't... it's not something you want to reimplement yourself.
Did XHTML ever allow styling with XPath, or is it just used for jQuery-style element selection? My uneducated impression was that for performance reasons browser devs were rejecting even more restricted backwards-lookup and nesting selectors for styling.
I've encountered some uses of XPath to validate the semantics of XML data models (this kind of thing still has mindshare in some places - network devices for instance https://tools.ietf.org/html/rfc7950#section-7.21.5 ) and I've come to the conclusion that wanting XPath is usually a danger sign. It's helpful for manipulating content you can't control, but that leads to people designing interfaces and models that are too difficult to work with downstream. In document display settings that manifests as a content vs. presentation mentality where you have to write byzantine queries to wrestle data into reasonable output. In configuration use cases I've seen it lead to interfaces that claim way more general edit operations than the implementation can actually pull off, as people struggle to specify XPath-based constraints to make it tractable. It's much better if at all possible to have some kind of scripted layer to do transformations than to try to manipulate a general presentable-cum-semantic data tree.
On the XHTML vs HTML5 side, React at least does a good job warning you if you're trying to render something that violates DOM nesting (out of necessity as much as anything, since it will need to operate on the generated tree).
> Ultimately XHMTL is better suited to epub's use case, which is static documents that are written once and not dynamically generated by templating languages or libraries.
That's backwards. Relying on software to assign random meaning to broken input is never a good solution to anything, neither for static documents nor for dynamically generated documents. It's just that HTML allows you to get away with it, kindof (i.e., no scary error messages, just subtle breakage everywhere), which is why people built templating systems that didn't bother with correctness either. If the web had been XHTML from the beginning, templating systems would have been built in a way that makes it hard to generate broken documents in the first place, because the error messages would have created the incentive to do so, thus avoiding a lot of the subtle breakage as well.
Small nitpick, XHTML does live on in one important place: epub, which is the most commonly used file type for ebooks.
I do a lot of epub work at Standard Ebooks, and at first I hated XHTML for the exact reason the article describes: A single tiny error makes the entire document invalid and it can be hard to spot it and recover. Also, namespaces were (and still are) a massive pain for little to no gain.
But over time I've come to really appreciate XHTML, again for that exact same reason. It forces you to write correct XML, and there is no possible ambiguity or multiple ways of representing a tag like there is in HTML5. `<br/>` will always be `<br/>`, not maybe `<br>` or `<br/>` or even `<br></br>`. An XHTML document from someone else will be both human-readable (after pretty-printing) and easily machine parseable. You also get xpath, which is a way of selecting elements akin to CSS but even more powerful.
Ultimately XHMTL is better suited to epub's use case, which is static documents that are written once and not dynamically generated by templating languages or libraries. Too bad there was some talk of epub switching to HTML5, though it's unlikely future revisions of epub will ever see wide adoption.