One thing that I just learned that is SUPER WEIRD is that Thompson's paper contains the word "derivative" but it doesn't contain the words NFA, DFA, or even "state" !!!
First paragraph:
In the terms of Brzozowski [1], this algorithm con- tinually takes the left derivative of the given regular expression with respect to the text to be searched.
Thompson learned regular expressions from Brzozowski himself while both were at Berkeley, and he credits the method in the 1968 paper. The now-standard framing of Thompson's paper as an NFA construction and the addition of caching states to produce a DFA were both introduced later, by Aho and Ullman in their various textbooks. Like many primary sources, the original is different from the textbook presentations of it. (I myself am also guilty of this particular misrepresentation.)
However, figure 5 in Thompson's paper is CLEARLY an NFA for the regex a(b|c)d.
What's the connection between derivatives and NFAs? I don't understand why Thompson says his method is derivative-based and not NFA-based.
(The 2007 "Re-examined" paper linked here comments upon the Thompson line in section 6. They qualify Thompson's statement a bit, but I don't understand it fully. )
http://www.oilshell.org/archive/Thompson-1968.pdf
One thing that I just learned that is SUPER WEIRD is that Thompson's paper contains the word "derivative" but it doesn't contain the words NFA, DFA, or even "state" !!!
First paragraph:
In the terms of Brzozowski [1], this algorithm con- tinually takes the left derivative of the given regular expression with respect to the text to be searched.
From: https://research.swtch.com/yaccalive
Thompson learned regular expressions from Brzozowski himself while both were at Berkeley, and he credits the method in the 1968 paper. The now-standard framing of Thompson's paper as an NFA construction and the addition of caching states to produce a DFA were both introduced later, by Aho and Ullman in their various textbooks. Like many primary sources, the original is different from the textbook presentations of it. (I myself am also guilty of this particular misrepresentation.)
However, figure 5 in Thompson's paper is CLEARLY an NFA for the regex a(b|c)d.
What's the connection between derivatives and NFAs? I don't understand why Thompson says his method is derivative-based and not NFA-based.
(The 2007 "Re-examined" paper linked here comments upon the Thompson line in section 6. They qualify Thompson's statement a bit, but I don't understand it fully. )
(This came up on lobste.rs after a discussion of Thompson's construction: https://lobste.rs/s/fq8uil/aho_corasick#c_4xkm7z)