It's a bad question. 1. This question just exploits GPT-4's inability to count a...

benlivengood · on Aug 9, 2023

1. This is possibly an artifact of parity being easy to detect in base 10. I have less confidence that if you asked GPT to figure this out in trinary it would get it right. For a short trinary number it worked once (via chain-of-thought converting trinary to decimal) and then I got this result for a longer number which is trivially wrong:

"...The given number ends with a 2. In trinary, the only possible remainders when divided by 2 (in trinary) are 0, 1, and 2. Since the last digit is 2, the number 12101100102112_3 3 mod 2 (in trinary) is simply 2."

and to double-check that wasn't a fluke another run of the same prompt produced:

"To determine 12101100102112 mod 2 in trinary (base-3), we have to look at the least significant digit (the rightmost digit). The reason for this is that in base-10, a number mod 10 is simply its units digit, and similarly, in base-2 (binary), a number mod 2 is its least significant bit. The principle carries over to other bases."

This is an example of a reasoning error. If you want to generate a distribution of more answers my exact prompt was:

"What is 12101100102112 mod 2 in trinary?"

I'm getting an error using the plugins version (Authorization error accessing plugins), so this was GPT4-default.

2. Agreed, it was hard and took me a while to accurately count tildes in the prompt to be sure I wasn't making mistakes. I fell back to some kind of human chain-of-thought process by proceeding by discrete steps of 5-counts since I can't sight-count 27. I could have also used production rules from logic to eliminate two negations at a time. Any of these strategies are accessible to GPT-4 in chain-of-thought token-space but aren't used.

hgsgm · on Aug 9, 2023

You don't need trinary for this. Just ask if a base 10 number is a multiple of 3. That both more natural and a harder problem than multiples of 2 in trinary