Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a bad question.

1. This question just exploits GPT-4's inability to count accurately, which is some combination of how the attention mechanism and tokenization works. But counting isn't reasoning. If you go around the counting and ask what the value of p is negated 27 times, it will give you the right answer every time.

2. A reasonable human would probably make mistakes counting tildes at a pretty high rate. Most people would probably paste that into a word processor or otherwise use a program to find the number of ~ signs, which GPT-4 will do if you use the code interpreter.



1. This is possibly an artifact of parity being easy to detect in base 10. I have less confidence that if you asked GPT to figure this out in trinary it would get it right. For a short trinary number it worked once (via chain-of-thought converting trinary to decimal) and then I got this result for a longer number which is trivially wrong:

"...The given number ends with a 2. In trinary, the only possible remainders when divided by 2 (in trinary) are 0, 1, and 2. Since the last digit is 2, the number 12101100102112_3 3 mod 2 (in trinary) is simply 2."

and to double-check that wasn't a fluke another run of the same prompt produced:

"To determine 12101100102112 mod 2 in trinary (base-3), we have to look at the least significant digit (the rightmost digit). The reason for this is that in base-10, a number mod 10 is simply its units digit, and similarly, in base-2 (binary), a number mod 2 is its least significant bit. The principle carries over to other bases."

This is an example of a reasoning error. If you want to generate a distribution of more answers my exact prompt was:

"What is 12101100102112 mod 2 in trinary?"

I'm getting an error using the plugins version (Authorization error accessing plugins), so this was GPT4-default.

2. Agreed, it was hard and took me a while to accurately count tildes in the prompt to be sure I wasn't making mistakes. I fell back to some kind of human chain-of-thought process by proceeding by discrete steps of 5-counts since I can't sight-count 27. I could have also used production rules from logic to eliminate two negations at a time. Any of these strategies are accessible to GPT-4 in chain-of-thought token-space but aren't used.


You don't need trinary for this. Just ask if a base 10 number is a multiple of 3. That both more natural and a harder problem than multiples of 2 in trinary




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: