Try asking code-davinci-002 instead of text-davinci-003. curl https://api.openai...

zaroth · on Feb 1, 2023

Pretty sure that regexp is wrong though?

Wouldn’t having ‘\b’ on both sides match beginning AND end? It’s got the parenthesis for the ‘|’ in the wrong place.

codetrotter · on Feb 1, 2023

It’s definitely not doing what the prompt asked for.

The generated regex is the same as

    (\bdog\b)|(\bcat\b)

https://regex101.com/r/vTtEU4/1

I’m currently trying to figure out how to match a word starting with dog without using

    \bdog.*

because

.*

would proceed to eat the rest of the line.

So I was thinking I could say

    \bdog[^\b]*

But that doesn’t work, it also ends up eating the rest of the line as well.

shagie · on Feb 1, 2023

Use \S which is the opposite set of \s which avoids eating word boundaries too.

    \b(dog\S*)|(\S*cat)\b

You could also use a \B instead of a \S though there are different meanings there.

codetrotter · on Feb 1, 2023

It almost does the trick

https://regex101.com/r/sbpy8s/1

But this matches for example

    dog.cat

as one single word.

But I would like that it matches separately

dog

and

cat

in this case.

Likewise, I’d want for example

    dogapple-bananacat

to be matched as two separate words

    dogapple

and

    bananacat

After a bit more reading online I thought that maybe the following regex would do what I want:

    \b(dog\p{L}*)|(\p{L}*cat)\b

https://regex101.com/r/1NT5Ie/1

But that does not match

    dog42

as a word.

What I want is a way to include everything after dog that is not \b

And likewise everything preceding cat that is not \b

Edit: I think I’ve found it after reading https://stackoverflow.com/questions/4541573/what-are-non-wor...

    (\bdog\w*)|(\w*cat\b)

Seems to behave exactly like I want.

https://regex101.com/r/f3uJUE/1

KronisLV · on Feb 2, 2023

Out of curiosity: if humans have trouble coming up with anything non-trivial, like regexes, why should something that has been trained on the output of humans do much better?

To me it feels like if 90% of $TASK content out there would be bad and people would struggle with it, then the AI-genrated $TASK output would be similarly flawed, be it regarding a programming language or something else.

As a silly example, consider how much bad legacy PHP code is out there and what the answers to some PHP questions could become because of that.

But it's still possible to get answers to simplistic problems reasonably fast, or at least get workable examples to then test and iterate upon, which can easily save some time.

btown · on Feb 1, 2023

After all, who needs wget when you have \wcat!

paulclinger · on Feb 2, 2023

Agree; the ChatGPT answer is not correct, as the assignment is to match a word that starts with `dog` and ends with `cat`. You can make .* non-greedy by adding ? at the end, but it's not needed in this case, as the engine should backtrack. Something like this should work: /\bdog[\w_-]*cat\b/ (assuming _ and - should be allowed inside words). You can also specify word-separators ([^ ] instead of [\w_-]) if that's easier to read.

mminer237 · on Feb 1, 2023

  \bdog\w\*

codetrotter · on Feb 1, 2023

Yup. See my response to the other sibling comment. In particular:

    (\bdog\w*)|(\w*cat\b)

Seems to behave exactly like I want.

https://regex101.com/r/f3uJUE/1

ketzo · on Feb 1, 2023

Man, this thread is a great example for why I don't use regexes, lol

shagie · on Feb 1, 2023

Yep. But it gave straight up code rather than trying to persuade a natural language LLM to write code.

The regex I was expecting would be

    "\\b(dog.*)|(.*cat)\\b"

The key point is to ask the code model. Part of what ChatGPT does is it appears to categorize the question and then may dispatch it to the code model. If you know you have a code question, asking the code model first would likely be more productive and less expensive.

_tom_ · on Feb 1, 2023

That's not a good regex. The cat part is harder than the dog part.

you regex will match the whole line up to cat.

shagie · on Feb 2, 2023

I don't claim it was good - just what I was expecting from the prompt.