Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The time investment of writing a good hand-written parser is only worth it if you have users whose workflow would be improved by performance/error reporting gains once you have exhausted what a grammar-derived parser can provide.

For Clang and GCC is it absolutely a no brainer to use a hand-written parser.

If you're trying to get to an MVP compiler, or you're building a bootstrapping compiler, using a grammar-derived parser to get to the finish line faster is far more sensible.



Using parser generators doesn't make a lot of sense to me.

It's far easier to handwrite parsers.

Unless you're a midsized company with the resources and who doesn't mind bad error messages I would not recommend anyone use parser generators.


What could be easier than writing a grammar in EBNF-ish notation? This gives you the added benefit that the grammar is explicitly defined, and not implicit in the parser code.


> This gives you the added benefit that the grammar is explicitly defined

Especially if you're still refining the syntax: it is so comparatively easy to modify a grammar vs parser.


> What could be easier than writing a grammar in EBNF-ish notation?

Not learning a new language/build system/coding paradigm. :)


ANTLR took me an afternoon, before I could use it for a simple language, including adding it to my build tool.


> It's far easier to handwrite parsers.

It's also far easier to make buggy handwritten parsers. I don't have much experience, but in the project that I've seen, where a handwritten parser was used, the issue tracker was full of parser bugs.


I tend to write lots of tests at each level -- lexer, parser, AST, etc. -- to cover the different BNF grammar statements. I've done this with an XQuery and XPath parser I wrote for IntelliJ.

You can cover the different lexical variations in the lexical tests, and ensure that the lexer is reporting the correct things in the lexer tests. For these, I generally ensure I cover each token in a given BNF grammar statement, even if that means repeating checks for a given keyword, as it keeps the tests self-contained and easier to verify.

You can cover the different parser/grammar variations (different branches/paths in the BNF, missing tokens for error reporting and recovery, etc.) in the parser tests. These can make use of serializing the tree to a file, so your input is an example program and the output is the parsed AST tree.

The AST tests can then test the actual data model your AST provides, such as access to the value of a literal, or information about a type.

Higher-level tests such as constant folding can then be done using the AST API, or things like serializing the AST to a textual representation.


It's also far easier to debug handwritten parsers.


a parser generator can be a very good first pass at determining whether your language is properly formed


What do you mean by properly formed?


Probably that it doesn't have ambiguities


Thanks, that’s a better way of putting it.


Is that an important property? If so, why do you (*whoever) think so?


As always, it depends on what your goal is.

An unambiguous parser is easier to write tests for. It leads to fewer subtle "gotchas" where something syntax checks and compiles but is interpreted by the compiler as something very different than what the programmer intended. It's easier for third parties to write their own parsers and tooling for the language if they can target a formal grammar as opposed to the behavior of your Turing complete code.

But sometimes you don't care about any of those things. Especially if you're talking about a language with a small surface area. Or if you have to deal with legacy stuff (for example, parsing something like C++ that is heavily context sensitive, in which case trying to shoehorn things into a formal CFG so you can feed it to a generator might actually make things much harder than they have to be).


Not always, and that’s a great question! But I can help you catch things that might not be obvious when your first create the grammar, even such simple things as ambiguous nested if statements or errors in string formatting DSL syntax.


* it can, not I can


It takes 1 or 2 hours to make a decent pratt parser for even somewhat unusual syntax.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: