Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> A huge integer will always consume a near-quadratic amount of CPU time in conversion to or from a base 10 (decimal) string with a large number of digits. No efficient algorithm exists to do otherwise.

I don’t believe that. I did a quick search and didn’t find much, but:

Let d_0, d_1, etc be decimal digits, little endian (so the number is d_0 + 10d_1, etc). The goal is to compute that quantity in binary.

The naive algorithm is to convert d_0 to binary. Then compute 10 in binary, multiply by d_1, and add it. Then multiply to compute 10^2 in binary, and accumulate 10^2 · d_2, and repeat. For n digits, there are n steps, and each step involves two multiplications by small factors and an addition. The overall time is O(n^2).

But I would try it differently. To convert 2n-digit number, first convert the first n digits to binary (recursively) and convert the second n digits to binary. Then multiple the second result by 10^n and add to the first result.

Let’s simplify this analysis by making n be a power of 2, so 2n = 2^k. Then the big powers of 10 are all 10 to a power of 2, so 10 needs to be squared k times. Additionally, there is one 2^k-digit multiplication, two 2^(k-1)-digit multiplications, four 2^(k-2)-digit multiplications, etc.

With FFT multiplication, multiplication is O(digits * poly(log(digits)), as is squaring. This will all add up to k levels of recursion, each acting on a total of 2^k or so bits, taking time O(2^k) times some log factors. This comes out to O(n · poly(log(n)).

I have not implemented this or done the math rigorously :)

edit: this is also buried in the issue. There’s also:

https://github.com/python/cpython/issues/90716

I don’t get it.



Indeed, the naive algorithm is quadratic due to the multiplications. Yet everyone familiar with even a bit of crypto and/or otherwise working with huge numbers knows that multiplication can be subquadratic.

...and in doing a bit of research on how Python does multiplication, I found some... rather odd opinions, considering it's one of the few languages with arbitrary precision integers: https://discuss.python.org/t/faster-large-integer-multiplica...


Integers / floating point / bignum is such a problem in almost any language, with multiple competing implementations in most languages. I’d argue it’s probably a harder problem than Unicode support, although Python also managed to make that difficult with multiple implementations in the wild (both UCS-2 and UCS-4 are commonly used, depending on how it was compiled).


You are correct and further in the thread someone suggests basically an O(n log² n) algorithm which does basically what you say. The statement that no faster algorithm exists is false. The Python devs, in response, redirect users to a different thread in which they talk about other algorithms.

As you can see, the Python developers have chosen to leave the original incorrect statement about O(n²) time up in the initial post.


In most C and C++ libraries, itoa is actually is done recursively like this: Divide by 100000 to get the top 5 digits and the bottom 5 digits and do the conversion for each in parallel (relying on the superscalar nature of CPUs for parallelism, no SIMD). For longs, you divide by 100000 twice, and run 4 streams.

String to integer conversion, however, is a lot harder to do in log n time, but each iteration is usually faster. The same trick doesn't work - you can't efficiently do math on the base 10 string, so the equivalent division by 2^16 is very hard. I think it has to be done in linear time, but this expands to O(n log n) for arbitrary word width due to math ops.

However, a lot of what we do for atoi/itoa assumes you have a finite length. Same with the FFTs: the algorithms rely on finite length. Infinite-length bignum libraries have a huge cost on trivial things like this, and it's part of the cost of doing business in python.

There is a very good chance that the bignum library used here is not optimized for things like atoi and itoa - most bignum libraries are written for cryptography and math where these are not done frequently.


.NET also seems to mitigated this problem: "for really, really big BigIntegers" https://devblogs.microsoft.com/dotnet/performance_improvemen...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: