There’s a fair bit of AVX/SSE code out there, but these days the vast bulk of AV...

berkut · on June 23, 2020

They're admittedly not applications most users run every day, but many multimedia applications (audio processing, encoding, decoding) is mostly done with hand-crafted instrinsics, the same goes for video stuff.

In an even more niche area (high-end VFX apps, like compositors, renderers) SSE/AVX intrinsics are used quite a bit in performance-critical parts of the code, and auto-vectorisers can't yet do as good a job (they're pretty useless at conditionals and masking).

saagarjha · on June 23, 2020

Even less esoteric: your libc likely has at least a half dozen vectorized functions for the mem* and str* functions.

wolf550e · on June 22, 2020

But is the bulk of AVX code by time spent running, code that was generated by autovectorizer? The SIMD in openssl and ffmpeg is written by hand. I bet the code that spends a lot of time on the CPU, especially the code that runs a lot while humans are waiting, is written by hand.

throwaway5792 · on June 23, 2020

Those should have AArch64 versions written. AArch64 is old now, it's not some niche architecture.

wolf550e · on June 23, 2020

Desktop productivity content creation apps have never before needed ARM versions, so many probably don't have ARM specific optimizations, and some probably have x86 specific code that is just enabled by default.

The memory model differences are going to be painful to debug, I think ("all-the-world's-a-VAX syndrome" is now "all the world's a Pentium/x86-64").