Let's see. So there are parsers in various languages, parsing various MD dialects, with varied internal representations, and surrounding ecosystems. And there are attempts at more turnkey document processing systems, often with a more extended dialect, and some collection of feature plugins. Often you can write pipeline AST filters in the given language, and sometimes get out an AST as JSON, and sometimes reinject JSON AST (allowing writing a filter in any language). Which leaves questions like: what dialect is the parser; is that extensible; how robustly correct is it; how clean and easily used and fragile is the AST; how well do the plugins/ecosystem already support your needed features. That AST one, I think of as a big deal, and hard to get a handle on. Aside from manipulation pragmatics, the asts resulting from parsing can get richly creative in quirkiness, that you then may need to regularize.
So I guess two main observations. On build-vs-buy for backend features, given the breadth of possible "we want it like this, and not that", if one can easily play with ASTs, I was surprised by how quickly reinventing the wheel became a plausible call. Possibly skimming existing backend code for insight and templates, but mostly not using it (aka struggling to configure it to give you "this and not that"). The other observation, is once you have ast and don't care about existing backends, your choice of parser and backend language/ecosystem decouple. One might use `pandoc --to=json` and then JS generic-ast tooling to emit HTML.
For parsing, a glance suggests Mdast emphasizes CommonMark and Github-flavored dialects. Pandoc-flavored MD is a bit broader.[1] My fuzzy recollection is I chose a pandoc parse for that, and an expectation of robustness ("it's haskell, and popular"), despite the then less that wonderful docs. IIRC, the resulting asts were fine. For backend, I wanted simple and concise to minimize burden, thus pattern matching (IIRC, most node types ended up a line or two), and chose road-less-traveled Julia for off-topic reasons (was thinking of using Julia for a compiler backend).
Thanks for your thoughts on Mdast - I'm tempted to play with it.
So I guess two main observations. On build-vs-buy for backend features, given the breadth of possible "we want it like this, and not that", if one can easily play with ASTs, I was surprised by how quickly reinventing the wheel became a plausible call. Possibly skimming existing backend code for insight and templates, but mostly not using it (aka struggling to configure it to give you "this and not that"). The other observation, is once you have ast and don't care about existing backends, your choice of parser and backend language/ecosystem decouple. One might use `pandoc --to=json` and then JS generic-ast tooling to emit HTML.
For parsing, a glance suggests Mdast emphasizes CommonMark and Github-flavored dialects. Pandoc-flavored MD is a bit broader.[1] My fuzzy recollection is I chose a pandoc parse for that, and an expectation of robustness ("it's haskell, and popular"), despite the then less that wonderful docs. IIRC, the resulting asts were fine. For backend, I wanted simple and concise to minimize burden, thus pattern matching (IIRC, most node types ended up a line or two), and chose road-less-traveled Julia for off-topic reasons (was thinking of using Julia for a compiler backend).
Thanks for your thoughts on Mdast - I'm tempted to play with it.
[1] https://garrettgman.github.io/rmarkdown/authoring_pandoc_mar...