I did not verify this regex, on the very surface scan seems OK:
./main -m ./models/7B/ggml-model-q4_0.bin -t 4 --temp 0.7 --top_k 40 --top_p 0.5 --repeat_last_n 256 --repeat_penalty 1.17647 -n 1024 -p $'Here is a handy short form regex to validate an email address:
'
Here is a handy short form regex to validate an email address:
^([a-zA-Z0-9_.+-]+@[a-zA-Z0-9.-]+\.[a-z\.]{2,6})+$
The first character can be either uppercase or lower case. The second group must contain at least one letter and the third group may contain any number of characters (up to 5). The last part ensures that it ends with @ followed by two more letters separated by dots.
If you want to make sure that your input string contains only valid characters for emails then use this regex instead:
\A[\w.]@[\w.]\.\w{1,4}\z
Badly wrong, in part because your prompt was badly wrong, because there is no short regex to validate an email address. (It needs a way of saying “I can’t complete that because the prompt makes no sense for this reason”.)
What it emitted accepts a large number of invalid addresses (due to things like not checking dot placement, and the inexplicable (…)+ wrapping around the entire thing), and doesn’t accept a large number of valid addresses (some comparatively esoteric, like local parts containing any of !#$%&'*/=?^`{|}~ or IP addresses for the domain name, and some very reasonable, like TLDs of more than six characters, or internationalised TLDs even in Punycode form).
The description it emits does not match the regular expression at all well, either.
The second regex it emits is even worse than the first, unnecessarily uses PCRE-specific syntax, and is given with a nonsensical description. (Note: the asterisks got turned into italics, backslash-escape them here on HN. With this fixed, the regex was \A[\w.]*@[\w.]*\.\w{1,4}\z.)
> on the very surface scan seems OK
And there’s the danger of this stuff. As a subject-matter expert on regex and email, I glanced at the regular expression and was immediately appalled (… quite apart from the whole “here we go again, this is certain to be terrible” cringe on the prompt). But it looks plausible enough if you aren’t.
It is a bit crazy to me someone posts a regex like that without verifying and saying on surface level it looks good, implying the whole thing was useful and a good result.
I said it looks ok, not good. My comment is mostly about me being surprised a valid regex came out. I also asked it to write a regex to parse html which it happily answered. What does gpt4 say about parsing html ;)
But it is either going to be useful or harmful. Harmful if doing the regex validation itself is worse than not doing any validation at all or a very simple validation just checking that there is @ included somewhere.
For comparison GPT-4 provides the following Python regex and then warns that it does not catch all edge cases and that it’s better to use a dedicated library like email-validator:
./main -m ./models/7B/ggml-model-q4_0.bin -t 4 --temp 0.7 --top_k 40 --top_p 0.5 --repeat_last_n 256 --repeat_penalty 1.17647 -n 1024 -p $'Here is a handy short form regex to validate an email address: '
Here is a handy short form regex to validate an email address: ^([a-zA-Z0-9_.+-]+@[a-zA-Z0-9.-]+\.[a-z\.]{2,6})+$ The first character can be either uppercase or lower case. The second group must contain at least one letter and the third group may contain any number of characters (up to 5). The last part ensures that it ends with @ followed by two more letters separated by dots. If you want to make sure that your input string contains only valid characters for emails then use this regex instead: \A[\w.]@[\w.]\.\w{1,4}\z