2006-05-22

Back to the Rats

Since it came up again in LtU, and I was thinking about it after Burning Chrome mentioned parsers for syntax highlighting, I was at Henry S. Thompson's seminar on XML schema validation (which is essentially a similar problem to parser design), then Brenden Eich mentioned Narcissus in his closing session at XTech I've been playing with a packrat parser written in JavaScript.

I started again 7pm on Thursday during XTech*, and spent most the night on it, then did more on it on Saturday at BarCamp Amsterdam. Current work in progress is in this directory, with a simple demo which parses a definition of the grammar used to define grammars, then creates divs to render it prettily in the iframe. The grammar below the iframe is the grammar used to parse what's in the iframe, and below it are debug messages. Editing what's in the iframe causes events to mark it dirty, but the incremental reparsing doesn't work yet.

The next stage is to add the incremental reparsing, so when you do edit it it tries to reparse, and to link that to the divs so that you don't have to throw away all of the content of a div when you reparse it, as at present.

The Triep stuff is a combination of Trie with predicates that was another idea for incremental parsing taylored for interactive development. I'm going to add some form of trie to the parse states for autocompletion, though I may go back to using a standard trie as the calculus for the predicates is tricky, and reusing the packrat grammar may mean it's not needed.

One thing about using PackRat using the visitor pattern (or any other tree-walking recursive descent), is that you don't need to convert the grammar into a finite state machine, which means that while Henry S. Thompson's handling of numerical constraints is an elegant solution, the problem simply doen't exist - you can simply put the counter in a for-loop in the visitor's method, and you're done. The parser doesn't do that since most text grammar don't require it - it only handles zeroOrMore, oneOrMore or zeroOrOne, but nToM would be handled in much the same as those, and I don't know in principal why you couldn't use tree-walking descent for XML - my ASN XER to Java data bindings used exactly that approach, and that's a superset of WXD.


TME

Labels: , ,