Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’ll start.

Take an XML document. Validate it.

Take all of the top level elements. Call getElementByID on each with the same value. Combine all the answers into an array, eliminating the nulls.

You might expect that array to have length one or zero on all valid documents. You’d be wrong, and dangerously so for some XML schemas. You can use the same ID on every node and I don’t know of a parser that would balk at that. And yet every implementation will return the first node that has that ID, which will then change any time you descend into the DOM.



Why aren'y you just using xpath? There are mature and powerful xpath libraries in all major languages.

Of course if you use only one limited tool which was never meant to be the main manipulator of xml (getElementByID), then you'll run into problems. It is like never using regexp and complaining that simple strings are a bad data structure.


Xpath still uses the same broken DOM functions under the hood. Exact same outcome.


That just doesn't make any sense. Regardless of what functions are used "under the hood" (how do you even know that? Did you go through every xpath library? What do you even refer to when I use an xpath library in php to open an xml document. There isn't even a web page or "DOM" to speak of, there is no browser, no JavaScript, it's just xml.), the interface provides you with ability to select whatever you want from an almost arbitrarily complex xml document with 1-liners of xpath. This is not the same outcome as getElementByID-acrobatics.


If only you could specify in your schema that element must be unique. But wait a minute ! Actually, you can. That's what the unique attribute does.

And here lies the main problem of XML. As a technology, it is better than its reputation. Sadly, next to no one knows how to use it properly.


XML did a number of things right.

Too bad it was initially envisioned as a text markup language, with tags sparsely strewn around the text, and not as a data representation format.

So, the syntax ended up both overly cumbersome (see closing tags) and festooned with logically unnecessary shorthands like node attributes. Then, the terror of entities.

XSLT is a brilliant language, I'd say the first pure functional language widely used outside academia (in 2000s), but, based on XML syntax, it's completely unfit for human consumption.

If only the authors of XML could get rid of the shackles of SGML compatibility, and went with a simple, uniform syntax, e.g. s-expressions, we could still be gladly using it. Now we reinvent the ecosystem instead, with JSON (sigh) and YAML.


I see no issue except from the entities. Which by and large have died anyway as those come from DTDs and not from XSDs. I do not see why closing tags are a problem: my editor will insert them and it makes parsing the XML much easier/less error prone. XSLT is brilliant indeed (and to my eye quite readable, but then again I also like regexp ;-)


You can’t specify that attributes are unique, which is why id is broken. And thus why XML DSIG, which uses id extensively, is exceedingly difficult to harden.


Parent asked what's wrong with XML, not what's wrong with DOM. Does XML actually even define a special meaning for the `id` attribute?


The XML element object always has this function, in every implementation I’ve seen.


That does not make any sense. There are entire classes of XML parser that cannot implement getElementByID, such as streaming parsers. getElementByID is specified as part of the Document Object Model (DOM), not as part of XML. I don't think `id` even has a special meaning in XML.


I presume JSON is less ambiguous and easier to validate and parse?


JSON doesn't even have IDs, so in this particular regard the (similar) tool to what the poster uses doesn't even exist for JSON. So no, it would not be correct to say that in this particular regard JSON is somehow easier to parse.


Never seen getElementByID being used for XML data (not somethinghtml)...


This is the one half of my bad experiences are associated with:

https://www.w3.org/TR/xmldsig-core/


Well, XML Signatures and XML Encryption are a confluence of bad ideas from many disciplines :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: