Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There’s a fair bit of AVX/SSE code out there, but these days the vast bulk of AVX/SSE code is generated by the autovectorizer and that’s mostly going to work on NEON without a hitch. Clang enables the autovectorizer at -O2 by default.

I’d be interested in estimates of how much hand-written AVX/SSE your computer actually runs. The apps I’ve seen usually have a fairly small core of AVX/SSE code.



They're admittedly not applications most users run every day, but many multimedia applications (audio processing, encoding, decoding) is mostly done with hand-crafted instrinsics, the same goes for video stuff.

In an even more niche area (high-end VFX apps, like compositors, renderers) SSE/AVX intrinsics are used quite a bit in performance-critical parts of the code, and auto-vectorisers can't yet do as good a job (they're pretty useless at conditionals and masking).


Even less esoteric: your libc likely has at least a half dozen vectorized functions for the mem* and str* functions.


But is the bulk of AVX code by time spent running, code that was generated by autovectorizer? The SIMD in openssl and ffmpeg is written by hand. I bet the code that spends a lot of time on the CPU, especially the code that runs a lot while humans are waiting, is written by hand.


Those should have AArch64 versions written. AArch64 is old now, it's not some niche architecture.


Desktop productivity content creation apps have never before needed ARM versions, so many probably don't have ARM specific optimizations, and some probably have x86 specific code that is just enabled by default.

The memory model differences are going to be painful to debug, I think ("all-the-world's-a-VAX syndrome" is now "all the world's a Pentium/x86-64").




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: