I wrote a workshop paper last year, but it wasn't very deep. Actually, incremental parsing, at least the way I do, doesn't really involve any special algorithms, here are the three points:
* making your lexer incremental first, memoize tokens between edits so you have something to attach trees to that will still be there after the next edit!
* Match braces (or determine indented blocks) separately before you even bother parsing. This makes incremental parsing more robust, and it is really easy to do without parsing information (unless your language overloads brace lexemes, like Java generics do! Scala didn't have this problem however). Also, autocomplete braces even if your users really hate that, because otherwise you won't have anything nice to reason about 99% of the time (indented languages are really better here for obvious reasons). Tell them to just use emacs if they don't like it.
* The last thing to do is just memoize parse trees (by attaching them to your memoized tokens) and then replay parsing on trees whenever their parses could have changed. So if their is an edit, invalidate the inner most tree, or multiple trees if the edit occurs at a boundary. If a child tree parse changes, invalidate its parents. Log your symbol table additions so you can undo them when they are no longer done on a parse (or the parse tree is deleted); trace your symbol table references if the symbol binding for the name being looked up changes, and don't worry about type checking in a separate pass, because you can just replay the parse multiple times if necessary.
The last point is quite fascinating and something I'm working on right now to turn into a more general programming paradigm. Check out:
* making your lexer incremental first, memoize tokens between edits so you have something to attach trees to that will still be there after the next edit!
* Match braces (or determine indented blocks) separately before you even bother parsing. This makes incremental parsing more robust, and it is really easy to do without parsing information (unless your language overloads brace lexemes, like Java generics do! Scala didn't have this problem however). Also, autocomplete braces even if your users really hate that, because otherwise you won't have anything nice to reason about 99% of the time (indented languages are really better here for obvious reasons). Tell them to just use emacs if they don't like it.
* The last thing to do is just memoize parse trees (by attaching them to your memoized tokens) and then replay parsing on trees whenever their parses could have changed. So if their is an edit, invalidate the inner most tree, or multiple trees if the edit occurs at a boundary. If a child tree parse changes, invalidate its parents. Log your symbol table additions so you can undo them when they are no longer done on a parse (or the parse tree is deleted); trace your symbol table references if the symbol binding for the name being looked up changes, and don't worry about type checking in a separate pass, because you can just replay the parse multiple times if necessary.
The last point is quite fascinating and something I'm working on right now to turn into a more general programming paradigm. Check out:
http://research.microsoft.com/apps/pubs/default.aspx?id=2112...