Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Regular expression" lost the sense of "strictly only regular languages" around the time that Perl crushed the world.

It's probably time to give up on that terminology nit-pick.



Actually, I think it's still a very important distinction to make. Making this point helps encourage understanding what regular languages, backtracking, et cetera mean, which allows us as engineers use our tools more effectively.


It's not a nit-pick. True regular expression engines can run in guaranteed worst-case linear time. Non-regular regex engines generally have no such guarantee. If you're using a regular expression library to parse inputs on a server, such efficiency guarantees are a simple and effective defense against input-based denial of service attacks.


The post you are replying to is not about word usage, it is about an aspect of computer science.


> It's probably time to give up on that terminology nit-pick.

Many Perl coders are self-taught programmers with no CS background. If a man page calls something a regular expression, that is good enough for them.

Why drop to that level if you know better?

I don't want to appear "gratuitously ignorant" in front of someone who might be a decent computer scientist.

There are already ways in which I will appear ignorant due to actual ignorance; why go out of my way to dumb down the way I speak about something that I do understand?

If we are going to wreck math terminology why don't we just redefine "prime number". Let's see: 3 is prime, 5 is prime, 7 is prime, oh ... so 9 is prime 11, is prime, 13 is prime, 15 is prime ... And, what do you know: now we can use a regular regular regex for primality testing over the binary representation: why, it's just (0|1)*1. As a bonus, we can restore 1 to its proper status as a prime number and kick out the oddball 2.

Say, if Perl called an odd test function "isprime", would you start calling odd numbers prime?


I don't want to appear "gratuitously ignorant" in front of someone who might be a decent computer scientist.

Someone who is a decent computer scientist will be unlikely to care whether you say "regular expression" or "look how smart I am, I know to say that this thing Perl does is called regular expressions by ignorant plebes even though it strictly isn't, let's high-five!"

Because, y'know, someone who's actually a decent computer scientist probably has more important things to care about.


In that case, I also don't want to appear "gratuitously ignorant" in front of such a lesser computer scientist who does care. :)

Maybe one thing like this doesn't matter, but if you make a habit out of sloppy use of technical terms, it will not go unnoticed.

Analogy: one neuron firing might not exceed a threshold, but thirty seven of them might.

> "look how smart I am, I know to say that this thing Perl does is called regular expressions by ignorant plebes even though it strictly isn't, let's high-five!"

That's rather verbose, making the cheesy scare quotes finger gesture an appealing alternative.


I believe the next line in this traditional dialogue is “Language is a fluid thing and words are defined by their use!”

Your turn.


The next line is, which philosophy of meaning do you subscribe to? Ludwig Wittgenstein ("meaning is use")? Or are you an all out Humpty-Dumpty-ist ("when I use a word, it means just what I choose it to mean, not more or less")? Or maybe Sapir-Whorf: the language not only determines its own meaning, but your entire cognitive process.


I can't tell if you're defending this line of thought or making fun of it, but the term "regular language" has a specific mathematical meaning and perverting it to include how Perl uses the term is wrong, plain and simple.


I was, in fact, slightly mocking both the argument and the related argument that people who use sloppy language like to bring out in favor of using words to mean whatever they like. I certainly agree that words with precise meanings are important to use correctly.


It actually effectively lost it a bit before. Perl did not invent regular expressions, in fact Larry Wall started with a package by http://en.wikipedia.org/wiki/Henry_Spencer that itself was a reimplementation of an API that appeared in AT&T's V8 Unix that was released in 1985 and was incorporated into standard Unix tools like egrep.

Perl standardized a good enough regular expression dialect that everyone else copied it. But the divergence between "regular expression" and "strictly only regular languages" was already a lost cause.


POSIX regular expressions (file glob patterns, BRE's and ERE's) have more features than regexes in old CS papers, but they are still regular: they compile to automata with no backtracking.


The ERE described in egrep(1) has backreferences which can match non-regular languages. Are you thinking of some other notion of ERE?


He is thinking of what POSIX specified, which does not have backreferences. See http://www.regular-expressions.info/posix.html for details.


Aha, I didn't know about the difference between POSIX ERE and GNU ERE. Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: