Is this like the Karatsuba algorithm, where it's theoretically faster but not ac...

pbsd · 2025-05-16T17:52:16 1747417936

Karatsuba is definitely faster than schoolbook multiplication at practical sizes. You presumably mean Strassen.

thrance · 2025-05-16T16:43:47 1747413827

They tested it against BLAS primitives and found theirs to be faster, in hardware.

bee_rider · 2025-05-16T17:29:36 1747416576

The mention 5% improvements and small matrices in the abstract, so my gut says (I haven’t read the actual paper yet) that is probably is a practical-type algorithm.

qoez · 2025-05-16T16:52:00 1747414320

Since this is 2x2 there's simd instructions that can do this (or with two simd dot products) both on CPU and inside each GPU core. So with current hardware you won't beat writing this out manually.

odo1242 · 2025-05-16T17:32:11 1747416731

It’s faster for matrices of approximately at least ~256x256, though it depends on hardware

optimalsolver · 2025-05-16T16:43:40 1747413820

>where it's theoretically faster but not actually faster when run on real hardware

Aka a galactic algorithm:

https://en.wikipedia.org/wiki/Galactic_algorithm

thrance · 2025-05-16T16:45:23 1747413923

I feel like a galactic algorithm is slow even theoretically, unless you work with out-of-this-world data.

Whereas many algorithms are theoretically fine for real data but don't make good use of cache, or instruction sets...

drlobster · 2025-05-16T16:42:34 1747413754

[flagged]

meisel · 2025-05-16T16:47:16 1747414036

I figured it might, but I think that this is a top of mind question for people and would be nice to make clear in the comments of the post too. So often there’s some theoretical improvement on multiplication that isn’t actually practical. Regardless, they don’t seem to have posted results for CUDA, which is arguably more important than CPU multiplication which is what they tried

tsurba · 2025-05-16T16:50:56 1747414256

Probably why they tried to address it in the abstract already

tptacek · 2025-05-16T16:59:38 1747414778

Let's have more comments from domain experts comparing algorithms in the abstract to cuBLAS, and less like these. Thanks!

If they're wrong to speculate, well, there's a whole paper you can just go skim to find the bit that rebuts them.