Do you understand that you can't magic away the complexity bound on conv and matmul by simply taking the fft? I know the you do so given this very very obvious fact, there are only two options for how an fft approach could beat XYZ conventional kernel
1. The fft primitive you're using for your platform is more highly optimized due to sheer attention/effort/years prior to basically 2010. FFTW could fall into this category on some platforms.
2. The shape/data-layout you're operating on is particularly suited to the butterfly ops in cooley-tukey
That's it. There is no other possibility because again fft isn't some magical oracle for conv - it's literally just a linear mapping right?
So taking points 1 and 2 together you arrive at the implication: whatever you're doing/seeing/benching isn't general enough for anyone to care. I mean think about it: do you think the cudnn org doesn't know about cooley-tukey? Like just so happens they've completely slept on this method that's taught in like every single undergrad signals and systems class? So that must mean it's not a coincidence that fft doesn't rate as highly as you think it does. If you disagree just write your fftdnn library that revolutions conv perf for the whole world and collect your fame and fortune from every single FAANG that currently uses cudnn/cublas.
Do you understand that you can't magic away the complexity bound on conv and matmul by simply taking the fft? I know the you do so given this very very obvious fact, there are only two options for how an fft approach could beat XYZ conventional kernel
1. The fft primitive you're using for your platform is more highly optimized due to sheer attention/effort/years prior to basically 2010. FFTW could fall into this category on some platforms.
2. The shape/data-layout you're operating on is particularly suited to the butterfly ops in cooley-tukey
That's it. There is no other possibility because again fft isn't some magical oracle for conv - it's literally just a linear mapping right?
So taking points 1 and 2 together you arrive at the implication: whatever you're doing/seeing/benching isn't general enough for anyone to care. I mean think about it: do you think the cudnn org doesn't know about cooley-tukey? Like just so happens they've completely slept on this method that's taught in like every single undergrad signals and systems class? So that must mean it's not a coincidence that fft doesn't rate as highly as you think it does. If you disagree just write your fftdnn library that revolutions conv perf for the whole world and collect your fame and fortune from every single FAANG that currently uses cudnn/cublas.