> A huge integer will always consume a near-quadratic amount of CPU time in conversion to or from a base 10 (decimal) string with a large number of digits. No efficient algorithm exists to do otherwise.
I don’t believe that. I did a quick search and didn’t find much, but:
Let d_0, d_1, etc be decimal digits, little endian (so the number is d_0 + 10d_1, etc). The goal is to compute that quantity in binary.
The naive algorithm is to convert d_0 to binary. Then compute 10 in binary, multiply by d_1, and add it. Then multiply to compute 10^2 in binary, and accumulate 10^2 · d_2, and repeat. For n digits, there are n steps, and each step involves two multiplications by small factors and an addition. The overall time is O(n^2).
But I would try it differently. To convert 2n-digit number, first convert the first n digits to binary (recursively) and convert the second n digits to binary. Then multiple the second result by 10^n and add to the first result.
Let’s simplify this analysis by making n be a power of 2, so 2n = 2^k. Then the big powers of 10 are all 10 to a power of 2, so 10 needs to be squared k times. Additionally, there is one 2^k-digit multiplication, two 2^(k-1)-digit multiplications, four 2^(k-2)-digit multiplications, etc.
With FFT multiplication, multiplication is O(digits * poly(log(digits)), as is squaring. This will all add up to k levels of recursion, each acting on a total of 2^k or so bits, taking time O(2^k) times some log factors. This comes out to O(n · poly(log(n)).
I have not implemented this or done the math rigorously :)
edit: this is also buried in the issue. There’s also:
Indeed, the naive algorithm is quadratic due to the multiplications. Yet everyone familiar with even a bit of crypto and/or otherwise working with huge numbers knows that multiplication can be subquadratic.
Integers / floating point / bignum is such a problem in almost any language, with multiple competing implementations in most languages. I’d argue it’s probably a harder problem than Unicode support, although Python also managed to make that difficult with multiple implementations in the wild (both UCS-2 and UCS-4 are commonly used, depending on how it was compiled).
You are correct and further in the thread someone suggests basically an O(n log² n) algorithm which does basically what you say. The statement that no faster algorithm exists is false. The Python devs, in response, redirect users to a different thread in which they talk about other algorithms.
As you can see, the Python developers have chosen to leave the original incorrect statement about O(n²) time up in the initial post.
In most C and C++ libraries, itoa is actually is done recursively like this: Divide by 100000 to get the top 5 digits and the bottom 5 digits and do the conversion for each in parallel (relying on the superscalar nature of CPUs for parallelism, no SIMD). For longs, you divide by 100000 twice, and run 4 streams.
String to integer conversion, however, is a lot harder to do in log n time, but each iteration is usually faster. The same trick doesn't work - you can't efficiently do math on the base 10 string, so the equivalent division by 2^16 is very hard. I think it has to be done in linear time, but this expands to O(n log n) for arbitrary word width due to math ops.
However, a lot of what we do for atoi/itoa assumes you have a finite length. Same with the FFTs: the algorithms rely on finite length. Infinite-length bignum libraries have a huge cost on trivial things like this, and it's part of the cost of doing business in python.
There is a very good chance that the bignum library used here is not optimized for things like atoi and itoa - most bignum libraries are written for cryptography and math where these are not done frequently.
I don’t believe that. I did a quick search and didn’t find much, but:
Let d_0, d_1, etc be decimal digits, little endian (so the number is d_0 + 10d_1, etc). The goal is to compute that quantity in binary.
The naive algorithm is to convert d_0 to binary. Then compute 10 in binary, multiply by d_1, and add it. Then multiply to compute 10^2 in binary, and accumulate 10^2 · d_2, and repeat. For n digits, there are n steps, and each step involves two multiplications by small factors and an addition. The overall time is O(n^2).
But I would try it differently. To convert 2n-digit number, first convert the first n digits to binary (recursively) and convert the second n digits to binary. Then multiple the second result by 10^n and add to the first result.
Let’s simplify this analysis by making n be a power of 2, so 2n = 2^k. Then the big powers of 10 are all 10 to a power of 2, so 10 needs to be squared k times. Additionally, there is one 2^k-digit multiplication, two 2^(k-1)-digit multiplications, four 2^(k-2)-digit multiplications, etc.
With FFT multiplication, multiplication is O(digits * poly(log(digits)), as is squaring. This will all add up to k levels of recursion, each acting on a total of 2^k or so bits, taking time O(2^k) times some log factors. This comes out to O(n · poly(log(n)).
I have not implemented this or done the math rigorously :)
edit: this is also buried in the issue. There’s also:
https://github.com/python/cpython/issues/90716
I don’t get it.