I'm not quite sure that SIMD is a relevant benchmark for comparing the processor...

calaphos · on Dec 13, 2020

You can still benefit from AVX for short workloads, the latency of them is no more than other instructions.

Running things on some accelerator (gpu, etc.) usually involves writing a specific kernel in a language subset, manually copying data and generally long latencies. Unless there is a lot of data it won't be faster.

With AVX in the best case the compiler can just vectorize some loop, speeding it up 5x without any added latency or source code changes.

__s · on Dec 13, 2020

There are many applications that still need the general purpose of CPUs while being vectorized

See libjpeg-turbo, ffmpeg, crypto, hashing in general, ripgrep, simdjson, ...

https://woboq.com/blog/utf-8-processing-using-simd.html

See also: https://www.reddit.com/r/compsci/comments/4cq0ls/when_is_sim...

If simd was obsoleted by GPUs intel & amd wouldn't keep introducing wider simd extensions

tbenst · on Dec 13, 2020

SIMD and ie Nvidia WARP are not the same. Idk about Apple’s GPU, but for example there is no GPU alternative to the SQRTPD instruction (Square root of double precision). Also, when there is branch divergence across threads, CPUs still do a much better job than GPUs.

Curious to think about how unified memory may change the ratio of flops/memory access when it makes sense to shift job from CPU (better for low number) to GPU (better for high ratio)