Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s instructive to see how different MathML is from other math markup languages. Here’s the quadratic formula:

In troff, x = {-b +- sqrt { b sup 2 - 4ac}} over 2a

In TeX, x = {-b \pm \sqrt{b^2-4ac}} \over {2a}

In plain Unicode, π‘₯ = (βˆ’π‘ Β± √(𝑏² βˆ’ 4π‘Žπ‘))⁄2π‘Ž

In MathML, <mrow><mi>x</mi><mi>=</mi><mfrac><mrow><mi>βˆ’</mi><mi>b</mi><mi>Β±</mi><msqrt><mrow><msup><mi>b</mi><mi>2</mi></msup><mi>βˆ’</mi><mi>4ac</mi></mrow></msqrt></mrow><mi>2a</mi></mfrac></mrow>

MathML is simply unreasonable to write by hand. Most of the time it’s only ever used as an interchange format, automatically generated by tools.

Indeed, the only time I ever use it is with mandoc(1), the default manpage formatter on BSD, Illumos, and some Linuxes, which converts equations to MathML when converting manpages to HTML.



I audibly groaned at the XML version, but to be fair, it can be presented better:

  <mrow>
    <mi>x</mi>
    <mi>=</mi>
    <mfrac>
      <mrow>
        <mi>βˆ’</mi>
        <mi>b</mi>
        <mi>Β±</mi>
        <msqrt>
          <mrow>
            <msup><mi>b</mi><mi>2</mi></msup>
            <mi>βˆ’</mi>
            <mi>4ac</mi>
          </mrow>
        </msqrt>
      </mrow>
      <mi>2a</mi>
    </mfrac>
  </mrow>
Again, I'm not saying this is good. Compared to the brevity of TeX or troff, it's difficult to accept. But XML is easier to read when you give in to its heavyweight structure and format it appropriately.

By the way, on my system (OSX, Chrome), the unicode version is beautiful. I had not realized it was a good math option.


The big downside with the Unicode option is the poor handling of fractions. It's not so apparent with the Quadratic Equation, but if your have a complex divisor the / notation starts to fall apart.

Even properly formatted that MathML version is just awful. You could help it a bit by combining some of those <mi> elements on a single line maybe, but it's way too much mental effort to parse that mess.


Lack of proper subscript and superscript support is a killer for Unicode as well. Unicode is amazing for smaller, simple equations, but anything beyond that soon becomes extremely hard to process.


I know what you're referring to, but then again, that's a matter of formatting, not encoding.


You can remove the <mrow> inside the <msqrt> too.


Shouldn't the '=' and the minus and the other signs be inside <mo>, rather than <mi>, because they are binary operators, not identifiers? TeX for example, makes a clear distinction in terms of how much whitespace it would surround operators with, vs. identifiers.

In any case, I suspect MathML was intended as an intermediate computer-readable representation, not something that anyone would write by hand (MathJax can compile your LaTeX to MathML). I don't see what's wrong with an intermediate representation that's difficult to manipulate by hand. And unless I misunderstood the article, their point is that MathML is bad as an intermediate representation.


>Shouldn't the '=' and the minus and the other signs be inside <mo>, rather than <mi>, because they are binary operators, not identifiers?

You’re right; I should have marked those up with <mo> instead.


Well, ask Peter and their alter ego, perhaps they will provide an answer.


Yeah. I think that's inevitable with an XML syntax. XML is good at some things, but representing math expressions clearly is not one of them.


> Yeah. I think that's inevitable with an XML syntax. XML is good at some things, but representing math expressions clearly is not one of them.

What is XML good at? (by good I mean better than alternatives like JSON, YAML, HAML, etc)

The only thing I that might qualify is a long term/archival quality document format like ODF/OOXML. The inherently embeddedable nature of XML does seem like a nice fit but it gets very bloated very fast (deflated wrappers help though).


XML grew out of SGML/HTML, and there's no denying that its age shows. It's kind of like going into a house built in the 90s and seeing all the little things that just scream "90s house - that looks so dated!"

Did you know that web browsers have native support for XML? Try fetching an xml resource and pulling the responseXML value off the XHR - you have another DOM object right in your hands, that you can treat just like a regular Node.

The reason that XML is better than the alternatives has nothing to do with the syntax itself, but the tooling around it. Browser support, XQuery, XPath, XSD/RelaxNG, XSLT - please tell me where I can find the equivalents for any other markup language. You don't have to use them, either - but if you need them, they are there in pretty much every framework. XML had the first to market advantage, and was picked up in enterprise systems and made powerful and ubiquitous. If you need batteries included, XML is right there, the others are not. There is really nothing wrong with it, the syntax is clunky for some applications but great for documents, whereas e.g. JSON would be horrible in that scenario.


> Did you know that web browsers have native support for XML?

Not only that, browsers have support for XSLT 1.0, so you can format it and style it with CSS on the fly. Even mobile browsers support that, as far as I can tell.


Great explanation. The tooling and first mover advantage were the reasons I started working with it back when it was in its infancy.


1. XML is very easy to parse (linear-time). By comparsion, TeX or troff, while elegant, are not that easy to parse. In the sample formula (see the first comment) the parser that reads the beginning of the formula has no idea it's going to end up with a fraction until it sees \over. So it's a real parser and then a post-processor that sets things up; check "TeX: The Program" for details. And it can only parse one language. In XML it's just a very dumb highly optimized generic loader that can load any XML. I agree the content has to come from somewhere, but it's a different story.

2. XML data model is more sophisticated than JSON or YAML: it supports element ordering and mixed content and does it rather elegantly and succinctly. It also has namespaces (and these are very good namespaces, they're not hierarchical, they're just long names in a single flat namespace with convenient notation to shorten the long prefixes to reasonable size). As a result it's very easy to define a new language, extend a language, mix multiple XML languages, etc. JSON and YAML are hopeless here.

3. XML comes with tools to define the type of the document or a fragment, so you can read a document and automatically check that it has the right syntax (and/or convert the data, such as dates, into the native format). There are three ways to do this (DTD, Schema, Relax NG) in order of increasing power and expressiveness (not just syntactic sugar, but different kinds of languages). In particular, it natively supports things like inter-element references, which is very convenient for complex documents.

4. XML comes with XSLT, which is a general-purpose tree transformer (transducer) with declarative syntax. This is an immensely valuable tool. To put things into perspective: a compiler is a special-purpose tree transformer that transforms the source tree of a program into machine code (which is also a tree, technically: sections, data, functions, etc.). Are you sure you don't need a general-purpose declarative tree transformer and prefer to write ad-hoc ones? :)

5. The specification of XML 1.0 is shorter than, say, YAML :) OK, this is only one part of XML landscape, the whole is much bigger, of course; but still this part (basic XML and DTD) is noticeably shorter than YAML. (I myself also find YAML pretty cryptic.)


Great summary! XML really isn't a bad format, it just gets bad mouthed by everyone who thinks it's a bitch to author -- and I agree, it is. But just don't author it by hand then. JSON or YAML of any reasonable length is also terrible to author by hand, yet you lose out on so many of the benefits of XML. And for what, better "hello world" samples?


> What is XML good at

Supporting XQuery/XPath queries and typing with DTDs are the big ones for me, including all the surrounding tooling. You can get replicas of both of these in JSON now, but I don't think they're as mature.


"What is XML good at?"

(For the purposes of this post, I'm including HTML in the XML family.)

XML/HTML is good when:

1. You have two dimensions of markup you want to do. That is, you have a clear distinction between what is a new "tag" and what is an attribute on that tag. If you can't almost instantly decide whether some feature you want to add works as an attribute or a tag, you probably shouldn't be in XML.

2. Almost every tag one way or another contains some text, the third dimension that XML supports. A proliferation of tags that never contain any text is a bad sign. A handful may not be a problem, e.g. "hr" in HTML, but they should be the exception.

3. You have a really good use case for XML namespacing, the fourth dimension of information that XML supports, in which case there's almost no competition for a well-standardized format, as long as you're also using the previous three dimensions.

There's sort of this popular myth that XML is useless, which I think isn't because it's true or that XML is bad, I think it's because in general, most times you want to dump out a data structure #1 isn't true, let alone #2 or #3. In a lot of data sets, you've only got the two dimensions of "simple structure" and "text", not annotations on the structure itself. (Or, perhaps even more accurately, they end up implicit in the format itself, and the format is constant enough for that to be just fine.) A lot of stuff in the 1990s and 200xs used XML "because XML" even though it clearly failed #1. XML is really klunky when you don't want that second dimension because the XML APIs generally can't let you ignore it, or they wouldn't actually be XML APIs.

On the other hand, when you learn this distinction, you do come across the occasional JSON-based format that clearly really ought to be XML instead. You can embed anything you want into JSON, but when you're manually embedding a second structure dimension into your JSON document, it loses its advantages over XML fast. If you've ever seen any of the various attempts to fully embed HTML into JSON, without leaving any features behind, you can begin to see why XML or XML-esque standards like HTML aren't a bad idea. HTML is much easier to read for humans than HTML-in-JSON-with-no-compromises.

And if you've truly got the four-dimensional use case, XML is really quite nice. When you need all the features, suddenly the libraries, completely standardized serialization, and XPath support and such are all actually convenient and surprisingly easy to use, for what you're getting.

Some examples: HTML is a generally good idea. SVG is a middling idea; it passes #1 and #3 but fails #2. SOAP and XML-RPC is generally a bad idea; SOAP fails #1 and #2 but sort of uses #3 and XML-RPC fails all three. XMPP I actually think is pretty solid as an XML format (mere network verbosity problems can be solved with an alternate encoding, though admittedly that becomes non-standard), and in a lot of ways, the real problem with XMPP isn't so much the format itself as that people are not used to dealing with the four-dimensional data structures that result. People expecting IRC-esque flat text are not expecting such detail. Using the fourth dimension of namespaces for extensibility is neat, but few developers understand it, or want to.


This is perhaps the best (most terse and accurate) summary of XML tradeoffs I've seen in years.

I generally don't just comment "attaboy" but there you go.


Why does this have to turn into an XML bashing thread? Writing the above equation in JSON would be just as terrible.


> Writing the above equation in JSON would be just as terrible

Not quite as bad as XML.. I think the problem is more the verbose, overly-nested format that was chosen for MathML than XML itself though.

  {
    "mrow": {
      "mi": [ "x", "=" ],
      "mfrac": {
        "mrow": {
          "mi": [ "βˆ’", "b", "Β±" ],
          "msqrt": {
            "mrow": {
              "msup": {
                "mi": [ "b", "2" ]
              },
              "mi": [ "βˆ’", "4ac" ]
            }
          }
        },
        "mi": "2a"
      }
    }
  }


That's as bad as the XML. Actually it's worse, because your ordering is undefined which breaks everything. They are both nearly unusable by humans.

Basically the problem is this: Either you explicitly represent the grouping in a general scheme capable of it, then you get the disaster (from the point of view of human readable and manipulable) that is XML or JSON. Or you use a domain specific language like LaTeX or whatever, with the attended parsing issues,etc.

If you want people to edit it by hand, the latter option is much better - but it has it's pain points. You don't get to use a broad range of robust tools to manipulate them, for one thing.


JSON object properties are unordered per the spec. This is not portable.


S-expressions are the better alternative to XML.


What you have doesn't work. What if I have "mi, mo, mfrac, mo, mi" at the top level? So something like "a - b/c + d". You can't specify the same key twice in JSON. Also, keys are technically unordered, so there's no guarantee that a parser will put that top-level "mi" before the "mfrac".

JSON is great at many things, but polymorphic substructures are AFAIK only really possible with everything being an object defining the "type" that it is. And that looks significantly uglier than what you have above:

    {
        "type": "mrow",
        "children": [
            {
                "type": "mi",
                "identifier": "x"
            },
            {
                "type": "mo",
                "operator": "="
            },
            {
                "type": "mfrac",
                "rows": [
                    {
                        "type": "mrow",
                        "children": [
                            {
                                "type": "mo",
                                "operator": "-"
                            },
                            {
                                "type": "mi",
                                "identifier": "b"
                            },
                            {
                                "type": "mo",
                                "operator": "Β±"
                            },
                            {
                                "type": "sqrt",
                                "expression": {
                                    "type": "mrow",
                                    "children": [
                                        {
                                            "type": "mi",
                                            "identifier": "b"
                                        },
                                        {
                                            "type": "msup",
                                            "expression": {
                                                "type": "mi",
                                                "identifier": 2
                                            }
                                        },
                                        {
                                            "type": "mo",
                                            "operator": "-"
                                        },
                                        {
                                            "type": "mi",
                                            "identifier": "4ac"
                                        }
                                    ]
                                }
                            }
                        ]
                    },
                    {
                        "type": "mi",
                        "identifier": "2a"
                    }
                ]
            }
        ]
    }


While this format is more generic, an abbreviated encoding can sometimes accomplish the same thing. For example, just moving the "type" to be the object key and removing the implied secondary name gets you this far:

    { "mrow": [
        { "mi": "x" },
        { "mo": "=" },
        { "mfrac": [
            { "mrow": [
                { "mo": "-" },
                { "mi": "b" },
                { "mo": "Β±" },
                { "sqrt": {
                    { "mrow": [
                        {"mi": "b"},
                        {"msup": { "mi": 2 }},
                        {"mo": "-"},
                        {"mi", "4ac"}
                    ]}
                }}
            },
            { "mrow": [
                {"mi": "2a"}
            ]}
        ]}
    }
It's not as general, but works if you know your syntax is similarly bounded. I don't know how certain static languages would handle serial/deserializing, but makes construction via javascript literals much more pleasant.


FWIW, this would be a literal s-expression translation:

    (mrow (mi x)
          (mo =)
          (mfrac (mrow (mo -) (mi b) (mo Β±)
                       (sqrt (mrow (mi b) (msup (mi 2)) (mo -) (mi 4ac))))
                 (mrow (mi 2a))))
And this would be a saner one, where mrow is implied:

    ((mi x) (mo =) (mfrac ((mo -) (mi b) (mo Β±)
                           (sqrt (mi b) (msup (mi 2)) (mo -) (mi 4ac)))
                          (mi 2a)))
I think either of those is clearly and inarguably superior.


Sure, that works as long as each type only has one property. Decoding it might be problematic; I don't know any serializers that would handle that kind of mapping natively.


Have a go at trying to convert a complex TEI or Docbook document to JSON; you will want to put a gun to your head before the day is out.


> What is XML good at? (by good I mean better than alternatives like JSON, YAML, HAML, etc)

It has a single standard way of doing schemata that all the tools support, which is great. The maven pom.xml format is a much clearer way to specify a dependency than most of the alternatives (which often use an excessively clever concise form), and has really good autocomplete when editing it in eclipse (because eclipse understands the schema and so can offer autocomplete based on the elements that make sense at that point in the document).

If XML had just not bothered with namespaces I think it would have worked really well.


Try the same equation but then with JSON, and see how it is even more terrible. Document Object Models as in HTML etc... are where XML is actually ok, as that was its design niche.

Could have been a bit simpler but hindsight...


Whenever I need to send dates across the wire with a JSON API, I sort of miss XML a little bit. It's a lot easier, and there are existing patterns and tools to help there. JSON really only supports three data types - boolean, numeric, and string. Any other type needs to be fiddled with some ad-hoc system for communicating the schema or just expecting created_at to be a date.

XML, on the other hand, had the ability to do something using attributes in the element or a DTD / XSL. Occasionally I do miss that ability to communicate data schema alongside the data. But only occasionally.


I don't see a reasonable alternative for text markup (HTML, ODF, OOXML)


XML is unreadable and unwritable for any markup-heavy document, and math expressions tend to be more markup than text.


Indeed. But you could always define your own alternative syntax and write a simple translator to XML.


MathML was never intended to be written by hand any more than SVG was. It was designed to provide a standard output format for equation editors. Wolfram created MathML explicitly to prevent TeX from being adopted as a defacto standard on the web because TeX, being concerned only with visual appearance, does not do a good job of describing the structure or semantics of equations. MathML excels in both these areas, providing a fairly simple and standardised visual model which fits well with web browsers, and a rich semantic model.

In other words, MathML is great and it's a much better fit for web browsers than anything else on offer. Just don't write it by hand!


In AsciiMath[1], x = (-b +- sqrt(b^2 - 4ac)) / (2a)

[1]: http://asciimath.org/


Good, you caught that he was only dividing by 2, not 2a


also, thanks for the pointer to ascii math, looks like a nice way to render


I wrote a clone1[1] of asciimath recently that aims to be just a little bit better then asciimath. It only targets MathML though.

[1]: https://runarberg.github.io/ascii2mathml


XML is not meant to be written primarily by hand (although I agree some XML languages could be more elegant). The markup part of XML is for the machine; if you remove it from your example, you'll get x=βˆ’bΒ±b2βˆ’4ac2a, which is, basically, the text content that was meant for the humans.

Now, as a machine language XML is much better than troff or TeX, because it's very easy to parse: it's basically a syntax tree, the result of parsing those other formats, you don't even really parse it, just deserialize. Naturally, it's a very good interchange format. Technically it would be a much better option than a full-fledged JavaScript parser and typesetter because it would've removed the parsing part. (And this is only one of the advantages.)


Does anyone else prefer writing troff/tbl/pic/eqn? I always felt that the language of TeX/LaTex was a step backward in user-friendliness.


I’m also one of those who prefers troff. I read all five volumes of Knuth’s Computers and Typesetting cover to cover, and TeX is a beautiful piece of work. But in my preferred alternate universe, two things would be different: Joe Ossanna would not have died young, and AT&T would have been more open with licensing in time for Knuth to use troff as a base for his improvements to the world of typesetting.


He tried, but the syntax was ill-defined, and the typography was poor.


I found troff syntax to be very hard to debug for parse errors, while you can easily fit the tex tokenizer into your head. There is also much less control in eg alignment of complex tables.


I usually use OpenOffice / LibreOffice Equation Editor which is based on eqn. Ironically the on disk format it uses is MathML.


Yes. I would use OOo if the equation typesetting wasn't awful. They don't seem to have put any effort into improving it over the years. MS Word typesetting has become quite good, though not up to the best you can get with LaTeX.


I've found XML is more suited for proper indentation formatting and reading top to bottom not left to right. When it is in up to down and indented it is VERY readable and can be even more understandable as the elements are well implied.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: