The issue isn’t performing the specific addition. Rather, you’re asking o1 to take n-bits of data and combine them according to some set of rules. Isn’t that what these models are supposed to excel at, following instructions? Binary addition is interesting because the memorization space grows at 2^n, which is impossible to memorize for moderate values of n.
I meant this in the general case, not specifically binary addition. Also, returning an token by ChatGPT is technically an O(1) operation, so the same principle applies. Returning a computation answer of O(n_required_tokens) cannot be delivered in O(1) time without some sort of caching.