Hacker Newsnew | past | comments | ask | show | jobs | submit | alexnovikov's commentslogin

Fair point! A single complex multiplication `(a+bi)(c+di)` indeed requires at least 3 real multiplications to be implemented.

However, when researchers (and systems like AlphaEvolve in this context) analyze fast matrix multiplication algorithms like Strassen's, the primary goal is usually to improve the asymptotic complexity (and understand the space of these algorithms better). This complexity is determined by the number of multiplications in the field over which the matrices are defined. * For real matrices, we count real scalar multiplications. * For complex-valued matrices (as in the 4x4 example where AlphaEvolve found a solution with 48 scalar multiplications), "scalar multiplication" refers to a complex scalar multiplication.

The key is that these are the operations you recurse on. Additions, or the constant factor cost of implementing one field's multiplication, don't change the exponent in the `N^(log_base(multiplications))` complexity. They are constant factors.

Of course, for practical performance on a specific 4x4 matrix, one would absolutely dive into the real operations, additions, memory layout, etc., but making 4x4 matrix multiplication practically faster on some particular hardware was not the focus in this section. (We do improve practical implementation of large matrix multiplications on the target hardware in the “Enhancing AI training and inference” section of the blog post.)

(Disclaimer: one of the authors.)


One of the authors here. We are aware of the Winograd scheme, but note that it only works over commutative rings, which means that it's not applicable recursively to larger matrices (and doesn't correspond to a rank 48 factorization of the <4,4,4> matrix multiplication tensor). The MathOverflow answer had a mistake corrected in the comments by Benoit Jacob.

More details: the Winograd scheme computes (x1+ y2 )(x2+ y1 ) + (x3+y4)(x4+y3)-Ai-Bj, and relies on y2y1 (that comes from expanding the first brackets) cancelling with y1y2 in Bj=y1y2 + y3y4. This is fine when working with numbers, but if you want to apply the algorithm recursively to large matrices, on the highest level of recursion you're going to work with 4x4 block matrices (where each block is a big matrix itself), and for matrices Y2Y1 != Y1Y2 (for general matrices).

Here is a website that tracks fastest (recursively applicable) matrix multiplication algorithms for different matrix sizes, and it stands at 49: https://fmm.univ-lille.fr/4x4x4.html

UPD: s/fields/rings/ and fixed equation rendering


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: