Hacker News new | past | comments | ask | show | jobs | submit login

Try asking code-davinci-002 instead of text-davinci-003.

    curl https://api.openai.com/v1/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -d '{
      "model": "code-davinci-002",
      "prompt": "##### Create a regular expression to match words starting with 'dog' or ending with 'cat'.\n    \n### Java Code",
      "temperature": 0,
      "max_tokens": 182,
      "top_p": 1,
      "frequency_penalty": 0,
      "presence_penalty": 0,
      "stop": ["###"]
    }'
This returned:

    ```java
    String regex = "\\b(dog|cat)\\b";
    ```



Pretty sure that regexp is wrong though?

Wouldn’t having ‘\b’ on both sides match beginning AND end? It’s got the parenthesis for the ‘|’ in the wrong place.


It’s definitely not doing what the prompt asked for.

https://regex101.com/r/ZNQa9X/1

The generated regex is the same as

    (\bdog\b)|(\bcat\b)
https://regex101.com/r/vTtEU4/1

I’m currently trying to figure out how to match a word starting with dog without using

    \bdog.*
because

    .*
would proceed to eat the rest of the line.

So I was thinking I could say

    \bdog[^\b]*
But that doesn’t work, it also ends up eating the rest of the line as well.


Use \S which is the opposite set of \s which avoids eating word boundaries too.

    \b(dog\S*)|(\S*cat)\b
You could also use a \B instead of a \S though there are different meanings there.


It almost does the trick

https://regex101.com/r/sbpy8s/1

But this matches for example

    dog.cat
as one single word.

But I would like that it matches separately

    dog
and

    cat
in this case.

Likewise, I’d want for example

    dogapple-bananacat
to be matched as two separate words

    dogapple
and

    bananacat
After a bit more reading online I thought that maybe the following regex would do what I want:

    \b(dog\p{L}*)|(\p{L}*cat)\b
https://regex101.com/r/1NT5Ie/1

But that does not match

    dog42
as a word.

What I want is a way to include everything after dog that is not \b

And likewise everything preceding cat that is not \b

Edit: I think I’ve found it after reading https://stackoverflow.com/questions/4541573/what-are-non-wor...

    (\bdog\w*)|(\w*cat\b)
Seems to behave exactly like I want.

https://regex101.com/r/f3uJUE/1


Out of curiosity: if humans have trouble coming up with anything non-trivial, like regexes, why should something that has been trained on the output of humans do much better?

To me it feels like if 90% of $TASK content out there would be bad and people would struggle with it, then the AI-genrated $TASK output would be similarly flawed, be it regarding a programming language or something else.

As a silly example, consider how much bad legacy PHP code is out there and what the answers to some PHP questions could become because of that.

But it's still possible to get answers to simplistic problems reasonably fast, or at least get workable examples to then test and iterate upon, which can easily save some time.


After all, who needs wget when you have \wcat!


Agree; the ChatGPT answer is not correct, as the assignment is to match a word that starts with `dog` and ends with `cat`. You can make .* non-greedy by adding ? at the end, but it's not needed in this case, as the engine should backtrack. Something like this should work: /\bdog[\w_-]*cat\b/ (assuming _ and - should be allowed inside words). You can also specify word-separators ([^ ] instead of [\w_-]) if that's easier to read.


  \bdog\w\*


Yup. See my response to the other sibling comment. In particular:

    (\bdog\w*)|(\w*cat\b)
Seems to behave exactly like I want.

https://regex101.com/r/f3uJUE/1


Man, this thread is a great example for why I don't use regexes, lol


Yep. But it gave straight up code rather than trying to persuade a natural language LLM to write code.

The regex I was expecting would be

    "\\b(dog.*)|(.*cat)\\b"
The key point is to ask the code model. Part of what ChatGPT does is it appears to categorize the question and then may dispatch it to the code model. If you know you have a code question, asking the code model first would likely be more productive and less expensive.


That's not a good regex. The cat part is harder than the dog part.

you regex will match the whole line up to cat.


I don't claim it was good - just what I was expecting from the prompt.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: