So why don't Perl, Ruby etc. use Thompson NFA?

jff · on Feb 5, 2009

It would appear that Perl and the others were based on a from-scratch reimplementation of the Eighth Edition regex library. To quote:

While writing the text editor sam [6] in the early 1980s, Rob Pike wrote a new regular expression implementation, which Dave Presotto extracted into a library that appeared in the Eighth Edition. Pike's implementation incorporated submatch tracking into an efficient NFA simulation but, like the rest of the Eighth Edition source, was not widely distributed. Pike himself did not realize that his technique was anything new. Henry Spencer reimplemented the Eighth Edition library interface from scratch, but using backtracking, and released his implementation into the public domain. It became very widely used, eventually serving as the basis for the slow regular expression implementations mentioned earlier: Perl, PCRE, Python, and so on.

chorny · on April 19, 2009

There are modules on CPAN for different types of regex engines.

newt0311 · on Feb 5, 2009

Same question here. I didn't realize that they didn't. regular expression NFA and DFA algorithms are in every compiler book out there. I find it hard to believe that guido et al. didn't know about these algorithms.

nostrademons · on Feb 5, 2009

Yeah, jimrandomh's comment basically explains it. Perl regexp's have features that can't be implemented by a DFA, like backreferences and arbitrary code evaluation. (I believe you can implement capturing subgroups, you'd just need to annotate the states with a marker for the current position in the string.) Those features are used often enough that you can't just leave them out.

I see no reason why popular languages couldn't implement both algorithms, though, and then select the backtracking one only if the regexp contains backreferences or expressions. It's easy enough to tell if a regexp uses these features, just checking for their existence when the regexp is compiled. Then you could use the fast algorithm when possible and the featureful one when not.

newt0311 · on Feb 5, 2009

Nope. Backref and friends can be implemented on top of DFAs. Grep offers an example of such an implementation.

michaelneale · on Feb 5, 2009

I am guessing the comment by jimrandomh explains why. to be perl compatible you need a lot more.