Exactly. And that in a nutshell is the problem with this article. It's an exposition of the kind of regex engine we all learned about in school, not the kind of tool that is useful for practical software development.
A regex matcher, alone, isn't really useful for much beyond a lexer generator. It's captured subexpressions and backreferences (well, really just captures) that turn it into the gadget we all love. And as you point out, that is not achievable with the simple state machine engine we all remember from class.
> It's captured subexpressions and backreferences (well, really just captures) that turn it into the gadget we all love. And as you point out, that is not achievable with the simple state machine engine we all remember from class.
Did you read [0]? It demonstrates that submatch extraction can be (and has been) implemented with DFAs:
"The extraction of submatch boundaries has been mostly ignored by computer science theorists, and it is perhaps the most compelling argument for using recursive backtracking. However, Thompson-style algorithms can be adapted to track submatch boundaries without giving up efficient performance. The Eighth Edition Unix regexp(3) library implemented such an algorithm as early as 1985, though as explained below, it was not very widely used or even noticed."
I said captures with backreferences, not captures alone. As the link you cite mentions, "as far as the theoretical term is concerned, regular expressions with backreferences are not regular expressions. The power that backreferences add comes at great cost: in the worst case, the best known implementations require exponential search algorithms, like the one Perl uses. [...] No one knows how to implement regular expressions with backreferences efficiently, though no one can prove that it's impossible either. (Specifically, the problem is NP-complete [...])"
A regex matcher, alone, isn't really useful for much beyond a lexer generator. It's captured subexpressions and backreferences (well, really just captures) that turn it into the gadget we all love. And as you point out, that is not achievable with the simple state machine engine we all remember from class.