I have worked for a company that (probably still is) heavily invested in XSLT for XML templating. It's not good, and they would probably migrate from it if they could.
1. Even though there are newer XSLT standards, XSLT 1.0 is still dominant. It is quite limited and weird compared to the newer standards.
2. Resolving performance problems of XSLT templates is hell. XSLT is a Turing-complete functional-style language, with performance very much abstracted away. There are XSLT templates that worked fine for most documents, but then one document came in with a ~100 row table and it blew up. Turns out that the template that processed the table is O(N^2) or worse, without any obvious way to optimize it (it might even have an XPath on each row that itself is O(N) or worse). I don't exactly know how it manifested, but as I recall the document was processed by XSLT for more than 7 minutes.
JS might have other problems, but not being able to resolve algorithmic complexity issues is not one of them.
Features are now available like key (index) to greatly speedup the processing.
Good XSLT implementation like Saxon definitively helps as well on the perf aspect.
When it comes to transform XML to something else, XSLT is quite handy by structuring the logic.
I never really grokked later XSLT and XPath standards though.
XSLT 1.0 had a steep learning curve, but it was elegant in a way poetry is elegant because of extra restrictions imposed on it compared to prose. You really had to stretch your mind to do useful stuff with it. Anyone remembers Muenchian grouping? It was gorgeous.
Newer standards lost elegance and kept the ugly syntax.
"Newer standards lost elegance and kept the ugly syntax."
My biggest problem with XSLT is that I've never encountered a problem that I wouldn't rather solve with an XPath library and literally any other general purpose programming language.
When XSLT was the only thing with XPath you could rely on, maybe it had an edge, but once everyone has an XPath library what's left is a very quirky and restrictive language that I really don't like. And I speak Haskell, so the critic reaching for the reply button can take a pass on the "Oh you must not like functional programming" routine... no, Haskell is included in that set of "literally any other general purpose programming language" above.
Serious question: would it be worth the effort to treat XSLT as a compilation target for a friendlier language, either extant or new?
There's clearly value in XSLT's near-universal support as a web-native system. It provides templating out of the box without invoking JavaScript, and there's demand for that[1]. But it still lacks decent in-browser debugging which JS has in spades.
SLAX is great, unfortunately it was released a bit too late.
XML world is full of ugly standards and failed contenders. None remembers RelaxNG. But had reacher expressive power than XMLSchema and a human-readable syntax.
It would at least be an interesting project. If someone put the elbow grease into it it is distinctly possible that an XSLT stylesheet could be not just converted to JS (which is obviously true and just a matter of effort), but converted to something that is at least on the edge of human usable and editable, and some light refactoring away from being decent code.
Just to add to this, we know have XXSLT which solves a lot of the original problems with XSLT.
Just to frame this people, imagine a JSON-based programming language for transforming JSON files into other JSON files and the program is also in JSON and turing complete. Now imagine it's not JSON but XML! Now any program can read it! Universal code, magic!
The idea behind XXSLT is now, we actually have a program whose job it is to specify a program. So we have a XML file which specifies a second XML file, which is the program, whose job it is to transform XML to XML. As we all know, layers of abstraction are always good, and common formats such as XML are especially good, so what we have now is the ability to generate a whole family and diverse ontology of programs, all of them XML, all of them by and for XML. Imagine the compiling with your favourite XML-based compilation chain!
XSLT just needs a different, non-XML serialization.
XML (the data structure) needs a non-XML serialization.
Similar to how Semantic Web's Owl has four different serializations, only one of them being the XML serialization. (eg. Owl can be represented in Functional, Turtle, Manchester, Json, and N-triples syntaxes.)
> XML (the data structure) needs a non-XML serialization.
KDL is a very interesting attempt, but my impression is that people are already trying to shove way too much unnecessary complexity into it.
IMO, the KDL's document transformation is not a really good example of a better XSLT, tough. I mean, it's better, but it probably can still be improved a lot.
XQuery is pretty close to "XSLT with sane syntax", if that's what you mean.
But the fundamental problem here is the same: no matter what new things are added to the spec, the best you can hope for in browsers is XSLT 1.0, even though we've had XSLT 3.0 for 8 years now.
S-expressions only represent nested lists. You need some other convention _on top of them_ to represent other kind of data, and that's generally the hard part.
Yeah... I posted too quickly: I want XSLT 3. The 1 & 2 specs are good first attempts, but are very difficult to use, effectively. As another poster also commented: it'd be nice if the implementation wasn't tied to XML, as well!
How, where? In 2013 I was still working a lot with XSLT and 1.0 was completely dead everywhere one looked. Saxon was free for XSLT 2 and was excellent.
I used to do transformation of both huge documents, and large number of small documents, with zero performance problems.
Probably corps. I was working at Factset in the early 2000's when there was a big push for it and I imagine the same thing was reflected across every Microsoft shop across corporate America at the time, which (at the time) Microsoft was winning big marketshare in. (I bet there are still a ton of internal web apps that only work with IE... sigh)
Obviously, that means there's a lot of legacy processes likely still using it.
The easiest way to improve the situation seems to be to upgrade to a newer version of XSLT.
I recently had the occasion to work with a client that was heavily invested in XML processing for a set of integrations. They’re migrating / modernizing but they’re so heavily invested in XSL that they don’t want to migrate away from it. So I conducted some perf tests and, the performance I found for xslt in .NET (“core”) was slightly to significantly better than the performance of Java (current) and Saxon. But they were both fast.
In the early days the xsl was all interpreted. And was slow. From ~2004 or so, all the xslt engines came to be jit compiled. XSL benchmarks used to be a thing, but rapidly declined in value from then onward because the perf differences just stopped mattering.
XSLt is not easy. It’s prologue on shrooms so to speak and it has a steep learning curve. Once mastered gives sudoku level satisfaction, but can hardly ever be a standard approach to built or templating as normally people need much less to achieve goals.
Perhaps it rather needs a facelift and support for JSON. I would imagine one day something regex or jq-level concise emerges, something reasonably short and descriptive, to allow transforming arbitrary tree into another arbitrary tree.
The idea behind XSLT is genial, but the real essence of it is XPath which makes it possible. And we've seen XPath evolve into CSS Selectors, and being useful on its own.
So in essence there are two sides of the transformation:
- selection - when you designate which parts of the tree match
- transformation - when building the new three
And while there are established approaches to the first part, perhaps XSLT is the only one which fits the definition of 'generally accepted' when it comes to the transformation.
But one can argue the transformation is possible with jq, it is just that I definitely don't like its overly-engineered syntax. IMHO the champion of the transformation syntax is yet undecided, even though in 2025 XSLT is still more or less king. Which is fascinating as XML is long not a usual choice of preference.
If XPath was the core nobody would need xslt, because pretty much every xml library can do XPath.
Don’t get me wrong, XPath is by far the best thing to come out of the xml ecosystem, but the actual idea at the core of xslt is the match/patch during traversal, and talking about it in terms of selection misses that boat entirely. Select / update is how you manipulate a tree with jQuery, or really the average xml library.
I recall seeing some libraries on github that convert back and forth between xslt and json... and another short hand version that was much clearer from Juniper I believe.
It's odd cause xslt was clearly made in an era where expecting long source xml to be processed was the norm, and nested loops would blow up obviously..
Yeah, I was using Novell DirXML to do XSLT processing of inbound/outbound data in 2000 (https://support.novell.com/techcenter/articles/ana20000701.h...) for directory services stuff. It was full XML body (albeit small document sizes, as they were usually user or identity style manifests from HR systems), no streaming as we know it today.
But they worked on the xml body as a whole, in memory, which is where all the headaches started. Then we introduced WSDLs on top, and then we figured out streaming.
Are you using the commercial version of Saxon? It's not expensive, and IMHO worth it for the features it supports (including the newer standards) and the performance. If I remember correctly (it was a long time ago) it does some clever optimizations.
We didn't use Saxon, I don't work there anymore. We also supported client-side (browser) XSLT processing, as well as server-side. It might have helped on the server side, maybe could even resolve some algorithmic complexities with some memoization (possibly trading off memory consumption).
But in the end the core problem is XSLT, the language. Despite being a complete programming language, your options are very limited for resolving performance issues when working within the language.
O(n^2) issues can typically be solved using keyed lookups, but I agree that the base processing speed is slow and the language really is too obscure to provide good DX.
I worked with a guy who knew all about complexity analysis, but was quick to assert that "n is always small". That didn't hold - but he'd left the team by the time this became apparent.
> Even though there are newer XSLT standards, XSLT 1.0 is still dominant.
I'm pretty sure that's because implementing XSLT 2.0 needs a proprietary library (Saxon XSLT[0]). It was certainly the case in the oughts, when I was working with XSLT (I still wake up screaming).
XSLT 1.0 was pretty much worthless. I found that I needed XSLT 2.0, to get what I wanted. I think they are up to XSLT 3.0.
Are you saying it is specified that you literally cannot implement it other than on top of, or by mimicing bug-for-bug, that library (the way it was impossible to implement WebQSL without a particular version of SQLite) or is Saxon XSLT just the only existing implementation of the spec?
Support required support from libxml/libxsl. That tops out at 1.0. I guess you could implement your own, as it’s an open standard, but I don’t think anyone ever bothered to.
I think the guy behind Saxon may be one of the XSLT authors.
The author of Saxon is on the W3C committee for XPath, XSLT, and XQuery.
That said, Saxon does (or at least did) have an open source version. It doesn't have all the features, e.g. no schema validation or query optimization, but all within the boundaries of the spec. The bigger problem there is that Saxon is written in Java, and browsers understandably don't want to take a dependency on that just for XSLT 2+.
It's generally speaking part of the problem with the entire "XML as a savior" mindset of that earlier era and a big reason of why we left them, doesn't matter if XSLT or SOAP or even XHTML in a way ... Those were defined as machine language meant for machine talking to machine, and invariably something go south and it's not really made for us to intervene in the middle; it can be done but it's way more work than it should be; especially since they clearly never based it on the idea that those machine will sometime speak "wrong", or a different "dialect".
It looks great, then you design your stuff and it goes great, then you deploy to the real world and everything catches on fire instantly and everytime you stop one another one starts.
> It's generally speaking part of the problem with the entire "XML as a savior" mindset of that earlier era and a big reason of why we left them
Generally speaking I feel like this is true for a lot of stuff in programming circles, XML included.
New technology appears, some people play around with it. Others come up with using it for something else. Give it some time, and eventually people start putting it everywhere. Soon "X is not for Y" blogposts appear, and usage finally starts to decrease as people rediscover "use the right tool for the right problem". Wait yet some more time, and a new technology appears, and the same cycle begins again.
Seen it with so many things by now that I think "we'll" (the software community) forever be stuck in this cycle and the only way to win is to explicitly jump out of the cycle and watch it from afar, pick up the pieces that actually make sense to continue using and ignore the rest.
A controversial opinion, but JSON is that too. Not as bad as XML was (̶t̶h̶e̶r̶e̶'̶s̶ ̶n̶o̶ ̶"̶J̶S̶L̶T̶"̶)̶, but wasting cycles to manifest structured data in an unstructured textual format has massive overhead on the source and destination sides. It only took off because "JavaScript everywhere" was taking off — performance be damned. Protobufs and other binary formats already existed, but JSON was appealing because it's easily inspectable (it's plaintext) and easy to use — `JSON.stringify` and `JSON.parse` were already there.
We eventually said, "what if we made databases based on JSON" and then came MongoDB. Worse performance than a relational database, but who cares! It's JSON! People have mostly moved away from document databases, but that's because they realized it was a bad idea for the majority of usecases.
Load it into a full programming language runtime and use the great collections libraries available in almost all languages to transform it and then serialize it into your target format. I want to use maps and vectors and real integers and functions and date libraries and spec libraries. String to string processing is hell.
Imperative code. Easy to mentally parse, comment, log, splice in other data. Why add another dependency just to go from json>json? That'd need an exceptional justification.
The fact that you bring up protobufs as the primary replacement for JSON speaks volumes. It's like you're worried about a problem that only exists in your own head.
>wasting cycles to manifest structured data in an unstructured textual format
JSON IS a structured textual format you dofus. What you're complaining about is that the message defines its own schema.
>has massive overhead on the source and destination sides
The people that care about the overhead use MessagePack or CBOR instead.
I personally hope that I will never have to touch anything based on protobufs in my entire life. Protobuf is a garbage format that fails at the basics. You need the schema one way or another, so why isn't there a way to negotiate the schema at runtime in protobuf? Easily half or more of the questionable design decisions in protobuffers would go away if the client retrieved the schema at runtime. The compiler based workflow in Protobuf doesn't buy you a significant amount of performance in the average JS or JVM based webserver since you're copying from a JS object or POJO to a native protobuf message anyway. It's inviting an absurd amount of pain for essentially zero to no benefits. What I'm seeing here is a motte-bailey justification for making the world a worse place. The motte being the argument that text based formats are computationally wasteful, which is easily defended. The bailey being the implicit argument that hard coding the schema the way protobuf does is the only way to implement a binary format.
Note that I'm not arguing particularly in favor of MessagePack here or even against protobuf as it exists on the wire. If anything, I'm arguing the opposite. You could have the benefits of JSON and protobuf in one. A solution so good that it makes everything else obsolete.
I didn't say protobufs were a valid replacement - you only think I did. "Protobufs and other binary formats already existed, [..]". I was only using it as an example of a binary format that most programmers have heard of; More people know of protobufs than MessagePack and CBOR.
Both XML and JSON were poor replacements for s-expressions. Combined with Lisp and Lisp macros, a more powerful data manipulation text format and language has never been created.
I think the only left out part is about people currently believing in the current hyped way, "because this time it's right!" or whatever they claim. Kind of the way TypeScript people always appear when you say that TypeScript is currently one of those hyped things and will eventually be overshadowed by something else, just like the other languages before it, then soon sure enough, someone will share why TypeScript happen to be different.
There have been many such cycles, but the XML hysteria of the 00s is the worst I can think of. It lasted a long time and the square peg XML was shoved into so many round holes.
IDK, the XML hysteria is similar by comparison to the dynamic and functional languages hysterias. And it pales in comparison to the micro services, SPA and the current AI hysterias.
IMHO it's pretty comparable, the difference is only in the magnitude of insanity. After all, the industry did crap out these hardware XML accelerators that were supposed to improve performance of doing massive amounts of XML transformations — is it not the GPU/TPU craze of today?
Now we have "JSON as savior". I see it way too often where new people come into a project and the first thing they want to do is to replace all XML with JSON, just because. Never mind that this solves basically nothing and often introduces its own set of problems. I am not a big fan of XML but to me it's pretty low in the hierarchy of design problems.
The only problem with XML is the verbosity of the markup. Otherwise it's a nice way to structure data without the bizarre idiosyncracies of YAML or JSON.
XML has its own set of idiosyncrasies like everything being a string. Or no explicit markup of arrays. The whole confusion around attributes vs values. And many others.
JSON has its own set of problems like lack of comments and for some reason no date type.
But in the end they are just data file formats. We have bigger things to worry about.
I mean, XML has its own bizarre idiosyncrasies like the whole attribute vs child element distinction (which maps nicely to text markup but less so for object graphs).
I would say that the main benefit of XML is that it has a very mature ecosystem around it that JSON is still very much catching up with.
> part of the problem with the entire "XML as a savior" mindset of that earlier era
I think part of the problem is focusing on the wrong aspect. In the case of XSLT, I'd argue its most important properties are being pure, declarative, and extensible. Those can have knock-on effects, like enabling parallel processing, untrusted input, static analysis, etc. The fact it's written in XML is less important.
Its biggest competitor is JS, which might have nicer syntax but it loses those core features of being pure and declarative (we can implement pure/declarative things inside JS if we like, but requiring a JS interpreter at all is bad news for parallelism, security, static analysis, etc.).
When fashions change (e.g. XML giving way to JS, and JSON), we can end up throwing out good ideas (like a standard way to declare pure data transformations).
(Of course, there's another layer to this, since XML itself was a more fashionable alternative to S-expressions; and XSLT is sort of like Lisp macros. Everything old is new again...)
Those were defined as machine language meant for machine talking to machine
i don't believe this is true. machine language doesn't need the kind of verbosity that xml provides. sgml/html/xml were designed to allow humans to produce machine readable data. so they were meant for humans to talk to machines and vice versa.
Yes, I think the main difference is having imperative vs declarative computation. With declarative computation, the performance of your code is dependent on the performance and expressiveness of the declarative layer, such as XML/XSLT. XSLT lacks the expressiveness to get around its own performance limitations.
It was very odd that a simple markup language was somehow seen as the savior for all computing problems.
Markup languages are a fine and useful and powerful way for modeling documents, as in narrative documents with structure meant for human consumption.
XML never had much to recommend it as the general purpose format for modeling all structured data, including data meant primarily for machines to produce and consume.